Scene models learn to recognize and recreate specific environments, locations, or backgrounds. Use them to generate consistent settings for your images.
Location consistency: Generate images in the same setting repeatedly
Virtual environments: Recreate spaces that are hard to access
Brand backgrounds: Maintain consistent backdrops for product shots
Storytelling: Create consistent worlds for visual narratives
You only need a minimum of 1 image to train a scene model. We recommend uploading several for richer environment capture — the first 3 images have the highest impact on the model's output.
Image Guidelines:
Clear view of the environment
Various angles of the same location
Different times of day if relevant
Minimal people or temporary objects
High resolution captures
Consistent representation of the space
Image order matters — if you're not satisfied with results, try reordering your training images.
Good training images:
Wide shots showing full environment
Different perspectives of the same space
Architectural details if relevant
Empty or minimal scenes work best
Consistent lighting style
Avoid:
Heavy crowds obscuring the environment
Too many temporary objects
Mixed locations in same training set
Low quality or blurry images
Extreme weather obscuring view
Navigate to Models > Create New Model
Select Scene as the model type
Upload your training images
Name your scene model
Click Start Training
Training takes around 20 seconds — you'll get a notification when it's done
Once trained, you can reuse your model unlimited times.
After training, open the dropdown below the prompt field, select your scene model, and it's added to your prompt like a building block. Then describe what should happen in the scene:
> "A woman walking through [your scene model], golden hour lighting"
> "Product photography setup in [your scene model]"
> "[your scene model] at night with dramatic lighting"
> "Close-up portrait with [your scene model] as background"
Describe the mood: Add lighting and atmosphere to your prompts
Place subjects naturally: Consider how people would interact with the space
Vary perspectives: Your scene can be shown from different angles
Combine with other models: Use scene models with person or object models