The AI prompt template for product photography that fixes generic outputs

April 28, 2026 · 12 min read

Type "product photo of a coffee bag" into any AI image generator. What comes back is a brown kraft bag with the word "COFFEE" stamped on the front in a generic display font, sitting on a wooden table with warm orange light and a cup of coffee beside it. It looks like a stock image you have seen a hundred times, because statistically that is exactly what the model averaged from its training data.

Now type the same prompt for your brand. The brand has a deep navy palette, an oat-linen accent, a small navy bird-silhouette wax seal that goes on every retail bag. The output does not change. The bag is still brown kraft. The fake "COFFEE" text is still there. The seal is missing. The model has no way to know that any of those brand details exist.

This is not a prompt-engineering problem you fix with cleverer wording. It is a structural gap that a brand profile fills. The post that follows is the exact prompt template I use for every product shot, the two-section structure that makes it work, and two before-and-after demos showing what the same model produces with and without it.

The hero demo: bare prompt vs branded prompt

The image on the right is what came out of a single prompt that included a 25-line brand profile. The bare prompt on the left is what every AI tool defaults to when you do not give it any structured brand input.

Prompt

Result

Same prompt. The only thing added on the right was the structured brand profile.

The difference is not artistic interpretation. The model rendered the same subject category, the same lighting brief, and the same surface. It just had additional structured input that overrode the defaults: a hex value for the bag color, a description of the wax seal, a photography reference, a list of forbidden patterns. Five short blocks the model could read and obey.

Why bare prompts always feel generic

Modern image generators are trained on hundreds of millions of product shots. When you give them an under-specified prompt, they return the statistical center of everything they have seen that loosely matches. For "coffee bag" that center is a brown kraft bag with the word "COFFEE" on the front, in warm orange light, on a wooden table. Saturated everywhere across the open web because that is what high-traffic e-commerce listings looked like in the 2010s.

The defaults the model picks are not random. They are the highest-probability choices for an unconstrained prompt:

Surface: brown wood
Bag: kraft brown, generic
Type on the bag: the word "COFFEE" in a display serif
Lighting: warm orange-amber from the upper right
Composition: the bag centered, dead-on, with a coffee accessory in the foreground

The fix is not to write longer prose in the prompt. Longer prose gets averaged the same way. The fix is structured input that the model treats as constraints rather than suggestions. Hex codes. Specific prop names. A photography reference. A forbidden patterns list. Each one displaces a default.

The structured prompt template

The template has two parts: a brand profile block that stays the same across every prompt, and a per-image scene block that changes. Paste the full thing into the prompt field. Most modern image tools accept multi-section prompts up to a few thousand tokens.

## brand
- name: Bluebird Coffee
- positioning: editorial restraint, slow craft, Pacific Northwest

## colors
- primary: #1f3a5f (deep navy, used on ceramic cups and packaging)
- secondary: #e9d8a6 (oat linen, used on aprons and napkins)
- accent: #c8553d (terracotta, used sparingly, one element per scene)
- neutral: #fafafa (warm off-white, surfaces and negative space)

## photo style
- lighting: north-facing window light, no flash, no harsh shadows
- composition: scene-first, subject second, product third, off-center
- depth of field: shallow, brand props in soft focus
- mood: Kinfolk magazine editorial restraint

## props
- navy ceramic cup with white interior
- oat linen apron, oat linen napkin
- brass espresso group head
- reclaimed wood counter
- terracotta planter, one per scene, sparingly
- kraft retail bag with small navy bird-silhouette wax seal on the front

## forbidden
- no centered dead-on product shots
- no warm orange-amber color cast
- no Edison-filament bulbs
- no brown ceramic, no white ceramic for primary cup
- no fake text on packaging

## scene
A pristine kraft retail bag with the small navy bird-silhouette wax seal on the front, sitting upright on the reclaimed wood counter, soft north-facing window daylight from the side, an oat-linen napkin partially folded beside it, the brass espresso group head softly blurred in the deep background, shallow depth of field, asymmetric composition with the bag positioned slightly off-center, intentional negative space, no other props in the frame.

That block is the entire prompt. The model parses each section as a constraint set. If the rendered bag has fake text on it, the forbidden patterns block should be expanded. If the lighting comes out warm orange instead of soft daylight, the photo style block needs to be more specific. The profile is iterative. You write it once and refine it as you find new failure modes.

A second demo: close-up packaged product

The hero demo above was the full retail bag at editorial distance. The second demo is the same product at close range, the kind of detail shot you need for a packaging carousel or a launch announcement. This time the workflow is shown end to end: a starting product photo on the left, the structured prompt in the middle, the on-brand output on the right.

Starting product photo: a generic brown kraft coffee bag with coffee beans scattered on a wooden table, warm amber backlight, the averaged stock-photo look most product photos start out as — Starting product photo on the left, structured prompt in the middle, on-brand output on the right. The brand profile and scene block together determine what survives from the input and what gets replaced.

The starting photo is what most operators actually have on hand: a generic kraft bag, lit warmly, shot for an old listing. The structured prompt does two things at once. The brand profile names the wax seal, the navy palette, the lighting, and the forbidden patterns. The scene block describes the specific shot. The output keeps the product category from the input (a kraft retail bag, close-range, behind-the-counter context) and replaces every default the model would otherwise have averaged: the lighting flips from warm amber to north-window daylight, the seal renders correctly because the profile names it, and the staging shifts from a generic wood plank to a cafe counter with a barista's hand at the edge.

Walking through each block

The five blocks are not interchangeable. Each one displaces a different default. A profile that skips a block leaves the model averaging on that axis.

colors. Hex values with role labels. "Primary, secondary, accent, neutral." Not just the colors, but which surface each color goes on. Without role labels the model averages the palette and the result is muddy. With role labels the model knows the navy goes on the cup and the terracotta goes on a single small planter.

photo style. Four short fields. Lighting, composition, depth of field, mood. The lighting field is the most consequential one because it overrides the warm-amber default that ships with every model. "North-facing window light, no flash, no harsh shadows" produces a completely different render than the same prompt without that line.

props. Three to seven specific objects, named exactly. Not "a cup" but "navy ceramic cup with white interior." Not "a planter" but "terracotta planter, one per scene, sparingly." Each named prop displaces the model's default for that prop slot. The list does not need to be exhaustive. It needs to cover the props that appear most often in the brand's real photos.

forbidden. A list of patterns the model should explicitly avoid. The forbidden block is doing different work than the props block. Props say "use this." Forbidden says "do not use this." Both matter. The forbidden block stops the model from inserting Edison bulbs every time you ask for a cafe scene, and stops it from rendering fake text on packaging.

scene. The only block that changes per image. Everything else stays the same across every prompt for that brand. The scene block is two to four sentences describing the specific shot you want: subject, framing, light direction, props in frame, composition. Keep the language plain.

When the brand is a logo, not a description

A brand profile written in text covers colors, props, lighting, and style. What it cannot describe is the actual visual identity of a logo or wordmark. No paragraph of description will reproduce a specific typeface, the exact curve of an emblem, or the proportions of a wordmark. For those assets, the answer is to upload the logo file as a reference image alongside the structured prompt. The model accepts inline images as part of the prompt and treats them as visual constraints in addition to the text.

The demo below uses the IL Gelato Hawaii brand. The logo on the left is the actual PNG file uploaded as an inline reference. The text in the prompt describes the desired product (a gelato cup with the brand on it), and the brand profile is appended in front. The output on the right shows the model applying the uploaded logo to a real-world product mockup.

Prompt

Result

Logo file uploaded as an inline reference. The model applied the actual brand mark to a real product mockup.

The output is not a stock image with a generic logo pasted on top. The model rendered the cup with the IL Gelato wordmark, the sun emblem, the tagline, and the rose-pink color all reproduced from the source PNG. That is what makes uploaded logos different from text descriptions: the model is matching pixels rather than averaging a description.

A few practical notes for uploading logos. Submit the highest-resolution version of the logo you have. The model preserves more detail when the source is sharp. Submit it on a white or transparent background so the logo is the unambiguous subject of the reference. And describe what you want done with the logo in the scene block: "printed on the front of the cup," "embossed in the corner," "as a wax seal." The model needs the application instruction, not just the asset.

This pattern works for any branded asset the model cannot infer from text: founder portraits, signature product shapes, packaging silhouettes, custom illustrations. Upload the asset, describe the application, and the model handles the rest.

The quick-start version

If you do not yet have a brand profile written, this 12-line skeleton is enough to start with. Replace the placeholders with your own values and iterate from there.

## colors
- primary: #YOUR_HEX (where it goes on the product)
- accent: #YOUR_HEX (used sparingly)

## photo style
- lighting: [north window | warm window | studio softbox]
- composition: scene-first, off-center, generous negative space
- mood: [Kinfolk | Cereal magazine | Apartamento | your own reference]

## forbidden
- no warm amber color cast
- no Edison bulbs
- no fake text on packaging

## scene
[two to four sentences describing the actual shot]

The skeleton above is what the structured brand profile extraction guide walks through in full, with the five inputs covered in order and a worked example for a real brand. Once the profile is written, the same template ships with every prompt for the same brand, and the only thing that changes per image is the scene block.

For a deeper look at why the bare prompt fails the way it does, the three failure modes post names each one specifically: color mismatch, forgettable execution, prop generalization. The brand profile is the explicit override for all three. For the menu of post types this template can render, see the seven types of branded social posts.

A profile takes under an hour to write, less than a minute to paste, and it does not decay. The same template that produces an editorial bag shot today produces an editorial cup shot, an editorial counter shot, an editorial flat lay, and an editorial flat-lay-with-product variation, all without changing anything except the scene block. Build it once. The averaging problem does not come back.

The AI prompt template for product photography that fixes generic outputs

The hero demo: bare prompt vs branded prompt

Why bare prompts always feel generic

The structured prompt template

A second demo: close-up packaged product

Walking through each block

When the brand is a logo, not a description

The quick-start version

More from the blog

How to extract your brand colors and voice for any AI tool

Why generic AI image generators fail for product brands

What does on-brand actually mean for social media content?