Clear language: An important tip: the magic doesn’t come from poetic and flowery language but from well-trained weights. DALL-E works best with clear, precise, short, and graphic-oriented language. (This is why DALL-E and GPT currently don’t work particularly well together, GPT tends to embellish and elaborate everything, even before the input text is sent to DALL-E, through expansion or translation. I now explicitly instruct GPT not to alter my prompts in any way.)
Literal or Miss-Understanding: Always keep in mind that DALL-E can misunderstand things, taking everything literally and putting all elements of the prompt into the image. Try to detect when a misunderstanding has occurred and oid it in the future. If you write texts not in English, they will be translated before the reach DALL-E, and the translated text maybe can he a conflict, when the original text hast not. Or short prompts are expanded. Check the truly used prompt for conflicts, not only what you entered.
Prompt Structure: Maybe order the advice in this way: Write the most important thing first, then the details, and finally technical instructions like image size, etc. It is even better for the naming of the files.
Order Matters: The order in which attributes are arranged has a certain influence. For example, the first described element is given slightly more attention. This also applies when assigning attributes to an object. Example: ‘red, orange, and yellow flowers’ versus ‘yellow, orange, and red flowers.’ Depending on the order, red or yellow becomes slightly more dominant. However, this is just one factor among many. This becomes particularly important in short prompts with few details.
Prompt Expansion: If a text is very short, GPT tries to expand it to make it more interesting. This is good if creativity is desired and you intentionally give up some control. You can prevent this by writing “use the prompt unchanged as entered.” And if you are not writing in English, “use the prompt unchanged as entered, and only translate it into English.”
Multi-image collection. There is the possibility to output multiple images in one. This is often useful for testing. “Multi-image collection” is the trigger I use to activate it. From experience, this instruction should be placed at the very beginning, as DALL-E tends to ignore it, and what comes first in the prompt is given more weight. The result varies between 4x1 and 7x3 images. The prompt should be simple, the more complex, the less variations and the lower the likelihood of it working. It is useful for testing for example styles or moods. The Images will he more consistency in the variations, because all the picture fragments are generated under 1 seed.
Tendency: A tendential language or terms can steer the generator in a certain direction without hing a strong influence on specific graphical objects themselves. A internal gap-filler is used to embellish and enrich the scene, and a tendency let this system chouse better fitting graphical elements, or can be overwritten and ignored easy from precise graphical elements. In a very darkish hellish scene, “beautiful” will he other effect then in a normal scene, or this tendency will simply be ignored. A mood or every vague quality is a tendency. Instead of write flowery poems in a prompt, a simple tendency will he the same effect.
Vagueness: beautiful, wonderful, bright, dark, lonely, chaotic etc. this all are attributes witch can aptly for many different graphical objects. and this creates a tendencial effect on all in the scene, and on the gap-filler. Creativity: A Tendency can even help by let the system create variations of scenes in a specific area. simply use very few words like “Beach. Photo style. dark and scary” will give you in every new picture completely different pictures, and any attribute then can constrain it to he a specific result. Chaos: “much chaos” or “little chaos” can let the system be very wild creative or be narrow to the description. Prompt Check: A scene is fully described and loaded with precise graphical descriptions, if a tendency, even different then the scene, not change much anymore (at least that’s how the system works now.)Photo-Technical Descriptions: One must understand that DALL-E does not perform exact calculations for lenses and apertures like a raytracer would. But it only has enough information from the data to make sense of the input, and they influence the images. What I he discovered so far is that the mention of an objective can at least influence the depiction. For example, specifying a wide-angle lens, like 18mm, actually results in a wider field of view in a landscape shot. So DALL-E can make sense of photo-technical descriptions, but not like a real camera. And you can use use simply “add a little deep-of-fild”, instead to use a very technical lens advice. You he to see such advice as a suggestion, not as a option. What works in what context is up to testing it out. Example: wide-angle lens 18mm has a effect on landscape and inside buildings. But macro on a landscape will he no effect.
Creativity: If you want to encourage DALL-E to exhibit unpredictable creativity while also testing a specific style, you can experiment with minimal instructions and a note to not alter the prompt. You can provide just a few guidelines with very few constraints. And GPT can give you style names for specific moods. For example: "Photo in Low-Key style. High level of detail. Landscape format with the highest pixel resolution. Generate only 1 image. Use the prompt exactly as provided, translating it into English without any changes."
Photorealistic: If you want to create photorealistic images, paradoxically, you should oid using keywords like “realistic”, “photorealistic”, or “hyperrealistic”. These tend to trigger painting styles that attempt to look realistic, often resulting in a brushstroke-like effect. Instead, if you want to define the style, simply use “photo style”. (Even fantasy images may gain al little quality this way, despite the lack of real photo training data.) If you aim for photography-like images, it makes sense to use technical photography terms, as DALL-E utilizes the metadata from images during training, if they contain technical information.
MidJourney Options: Some users use MidJourney options. I he experimented with them, and it seems that GPT interprets these options before they are sent to DALL-E. And DALL-E may be able to interpret some options, but it doesn’t truly understand them. In testing, it couldn’t be determined whether options like --Chaos, --quality, or --seed were recognized. While DALL-E might he some idea of how to interpret these options, they don’t really function as intended, and they aren’t directly supported, but still work some how anyway. The seed option doesn’t work at all because DALL-E doesn’t he this feature. “--style raw” for example not has the same effect like in MidJourney, but it seam to suppress the nonsense text a little, maybe…
Content Complexity: This is probably quite important for many, so here’s a slightly longer explanation. DALL-E processes about 256 words (specifically 256 cl100k_base tokens). Of these, roughly 30 to 40 graphical tokens can be maximally and correctly translated into a “photo style.” Beyond that, objects and colors start to degrade, objects no longer look organic, or the overall quality decreases. In general, it’s more about guiding DALL-E in the right direction rather than describing every detail exactly. It’s better to describe a comprehensive composition rather than an inventory list of details. Additionally, elaborate and poetic language seems to he little to no effect, it’s simply ignored. A simple description of the mood, like “dreamlike night atmosphere,” is enough to influence the entire scene. One must understand a bit about how an image generator works. It doesn’t need poetry or overly ornate, aesthetically enhanced language. Simple, precise, concise instructions work best, and not too many of them. Here, GPT’s tendency for expansive, embellished language conflicts with DALL-E’s need for short, precise descriptions. There is no LLM trained specifically to write effective DALL-E prompts yet, and I hen’t been able to stop GPT’s “overly embellished rambling” so far. For those who want to try: let GPT generate a detailed text, then reduce it to the essentials without removing graphical details. The quality of the result will likely be the same. I’ve gotten very extraordinary images with very simple descriptions, it’s more dependent on the training data and weights, and less on poetic language. My tip at the moment is to place details where something is important, or describe something multiple times to give it more weight, for the effect to control the diffusion effect, or correct something. However, roughly describing the overall scene and leing the rest to DALL-E has been the most efficient approach so far.
But all this is work in progress, if somebody know more or better, let us know, i will change the texts here.
Using structured JSON: @BPS_Software has found out that DallE can process structured JSON prompts. There is now the speculation that DallE can process them more exactly then a text prompt, maybe be able to give attributes more precisely to a object, and control the scattering. The discussion for this is here. A Study on Using JSON for DallE Inputs