how to properly guide image creation AIs


Dall-E, Midjourney, Stable Diffusion… Automatic imaging models are proliferating. The semantic design of queries must be well-oiled to achieve the desired clichés.

Deep learning models designed to automate the creation of images have been in development for several months. Among these artificial intelligences are the now famous Dall-E, Stable Diffusion or MindJourney. Their results are impressive (read the article Image Generation AI: JDN Test Shows Surprising Results). It remains to know how to formulate a query to retrieve the subject and the graphs being searched for. A new technique has emerged to solve this problem: incentives.

Briefly describe the topic

In general, the query will consist of one sentence that accurately and succinctly describes the target topic. “For example: ‘a kid throwing a frisbee,'” says Louis Bouchard, a doctoral student in artificial intelligence at the Montreal Polytechnic Institute and the Quebec Institute of Artificial Intelligence (Mila). “Then we’ll add keywords to refine certain details by trying several possible synonyms for each: ‘young or young child, black or brown hair, sports or casual clothing, village or countryside scene.'” The goal: to progress through trial and error to gravitate toward the formulation the machine best understands for the desired image.

We can also give stylistic indications: for example “oil painting” or even “photo-realistic painting”. In general, the more detailed and above all semantically complex the instruction, the more potentially inconsistent and therefore misleading the result.

Don’t look for a frozen result

From there, it’s important to let the AI ​​express itself without having a very precise idea of ​​the targeted representation. “I recommend taking the stance of a poet, not a programmer, and marveling at the machine’s creativity. Creating images can reveal a character, a style, a composition that you didn’t have in mind at the start, but it might fit perfectly with the goal,” notes Steve Coulson. “To find the one that works for you, I recommend multiplying generations of images until you get the result that inspires you. Feel free to do a few dozen or even a few hundred.”

The creative director of American transmedia storytelling agency Campfire knows what he’s talking about. He is the author of The Bestiary Chronicles, the first comic created entirely from generative AI, in this case Mindjourney. Combining fantasy, fantasy and science fiction, this odyssey is divided into four parts: Summer Island, Exodus, The Lesson and The Letter Home. Simultaneously realistic, accurate, and coherent in terms of narrative and style, the paintings give the impression of being born from the imagination of a sensitive and experienced artist.

Learn training datasets

“It is important to study the content of the training data sets of generative AIs used to optimize the query formulation, when they exist, in a stable state of diffusion. (Laion, editor’s note)“, Louis Bouchard emphasizes. These data sets are generally presented in the form of mass images associated with titles, all collected from the Internet. The analysis of the compilation of the latter will help to improve.

Adapt to the constant evolution of AI

Other than these few basic rules, it is difficult to provide a more precise tutorial. Steve Coulson admits: “Imaging AIs are constantly being improved. And with them, motivational techniques.” “When I started working on the Summer Island comic, Midjourney wouldn’t allow a character to be drawn from one miniature to another. So in this comic, the main character is a photographer who is invisible, but whose shots make up the story. They are shown from page to page.”

For the Exodus comic strip, changes made during this time to Midjourney allow Steve Coulson to bring the main characters to life on the boards. The only exception: faces. Faced with this flaw, the author found a solution: the faces of the characters will be hidden by the visor of their astronaut helmet. With The Lesson comic, the designer takes advantage of a new feature: the ability to combine actresses from Hitchcock films throughout the script. Thus, they become the central actors of the narrative. In the last part of The Bestiary Chronicles (The Letter Home), published only in December 2022, Steve Coulson takes advantage of the last opportunity: to create an original character and his face, with the ability to reduce it in postures and different looks. decorations.

Know how to repeat a structure, a character

How to do? Once a decor or character is created by the AI, the relevant graphics can be re-submitted to the model to be rejected or reused and integrated into a new shot. “The same query subjected to the same generative artificial intelligence can produce different results, the underlying models are not deterministic. Therefore, querying is not a perfect reproduction tool,” reminds Louis Bouchard. “The fact remains that tips will be more effective than others in producing quality images.”

Discuss with peers

To find the best prompts, the Promptbase marketplace offers thousands of examples with images. All include the three most popular image generation AIs: Dall-E, Midjourney and Stable Diffusion. At the same time, user communities come together to share best practices on the subject. Like the Learn AI Together server opened on Discord to increase exchanges between experts. “The goal is to feed growing knowledge bases like Learnprompting.org considering different generative AIs and their optimization over time,” says Louis Bouchard.

Posted in Art

Leave a Reply

Your email address will not be published. Required fields are marked *