Prompt engineering a nun

How we convinced Gemini 2.5 Flash Image to paint a full garment instead of slapping a top hat on a face.

The first version of Puritanizer used censor bars and top hats. It was fine. It was also a completely different product, and worse, so we threw it out after about forty-eight hours. The interesting part of the rewrite was the prompt.

The goal of the new prompt is narrow: take a photo of a person, paint a specific garment onto the part of the subject that is below the neck, do not touch the face, do not hallucinate extra limbs, do not invent a new background, and return a single edited image. That is a reasonable request to make of a state-of-the-art image model. It is still far easier to ask badly than to ask well.

What went wrong at first

Early prompts of the shape "cover this person in a nun habit" produced results like:

The nun habit painted next to the subject, floating in mid-air
The subject's face swapped for a nun's face, which is exactly what we did not want
A cut-out nun habit pasted over the whole image including the background
An entirely new scene with a different person wearing a nun habit

None of these are the tool. The tool is: same photo, same face, new clothes.

What actually works

The prompt that ships reads, in rough shape, something like this:

You are editing a photograph. The subject is a person. Paint the following garment directly onto the subject's body, as if they had been wearing it when the photo was taken. Preserve the subject's face, hair, and head exactly. Preserve the background exactly. Do not change the composition or lighting. Return a single edited version of the original image.

Three things matter here.

"As if they had been wearing it when the photo was taken." This framing shifts the model from "add an object to the image" to "edit the clothing of the person in the image." The difference in output is enormous. With the first framing you get a floating nun habit. With the second you get a nun.

"Preserve the subject's face, hair, and head exactly." Without this line, the model will occasionally decide that a nun habit works best with a new face. It does not. It works best with the original face, slightly confused, staring at the camera in full religious habit.

"Return a single edited version of the original image." Models trained on instruction data love to produce multiple candidates and narrate the differences. We want exactly one image back. Being explicit saves a whole round of post-processing.

The category prompt

"Roll the dice" mode adds another layer. We shuffle the whole catalogue, pick 8 candidates, and ask the model to choose whichever of the 8 fits the subject best. That last clause is the one that keeps the tool from dressing a beach photo in full samurai o-yoroi when a beach-appropriate option was sitting in the shortlist. The model is better at judging "best fit" than it is at random selection, so we let it make that call.

The part that is still awkward

Group photos. The prompt as currently written assumes one subject. If you drop in a group of five people, you get one of them heavily modest and four untouched, or five of them wearing the same outfit, or — once, memorably — one person wearing five nested outfits at once. We added a line about "all visible subjects" which helped but did not solve it. Group photos remain a known rough edge.

For single-subject photos, the prompt is locked in and behaves. For group photos, consider it a feature preview.

Try a single-subject photo →

Prompt engineering a nun

What went wrong at first

What actually works

The category prompt

The part that is still awkward

Put some clothes on a photo.