Visualising novels using Midjourney v6 and GPT-4 - Part 2: Locations and Key Events

January 14, 2024

See some of my favourite generations at the bottom of this post

In part 1 we saw how we can use descriptions generated by GPT-4 as prompts to Midjourney to visualise characters from our favourite novels. In part 2, we will apply the same idea to two more aspects of a novel: 1. locations, such as buildings, houses, or cities; and 2. key events, such as major plot points and famous scenes. This should help bring our favourite novels to life even further!

The code, now updated to include the option to visualise key locations and events, is available on my GitHub: https://github.com/parsaghaffari/Melquiades

Step 1. Listing characters, locations, and events

Our first task is to list all the key “thing”s of the same type from the book; be it characters, locations or events. So let’s modify the get_characters function to make it a bit more generic so it can generate lists of any of these types of things, not just characters:

@lmql.query(model="openai/gpt-3.5-turbo-instruct")
def get_things(thing_type, book, author="", num_things=5):
    '''lmql
    """Answer the following questions about the book {book} by {author}:

    Here's a list of major {type_prompt(thing_type)} from the book: \n"""
    things=[]
    for i in range(num_things):
        "-[THING]" where STOPS_AT(THING, "\n")
        things.append(THING.strip())
    return things
    '''

The get_things function:

Takes a thing_type which can be one of "characters", "places", or "events"
Takes a book and optionally an author name
Takes the number of things to generate (by default 5)
Constructs and submits a prompt to GPT-3.5 incorporating the above attributes, and captures num_things things in the response by leveraging LMQL’s scripted prompting feature

If you look at the prompt closely, you will notice that we’re also using a helper function called type_prompt to sligthly tweak the prompt based on the type of the entity we’re looking to make a list of:

def type_prompt(type):
    """Returns the prompt for the given type of entity"""
    if type == "characters":
        return "characters"
    elif type == "places":
        return "places (such as prominent buildings, landmarks, or locations)"
    elif type == "events":
        return "events (or epic scenes)"

Now if we run get_things("characters", "The Lord of the Rings") we get:

- Frodo Baggins
- Gandalf
- Aragorn
- Legolas
- Gimli

And get_things("events", "The Lord of the Rings") returns:

- The formation of the Fellowship of the Ring
- The journey through the Mines of Moria
- The Battle of Helm's Deep
- The Battle of Pelennor Fields
- The destruction of the One Ring in Mount Doom

And finally get_things("places", "The Lord of the Rings") returns:

- The Shire
- Rivendell
- Moria
- Lothlórien
- Isengard

Cool!

Step 2. Generating visually descriptive prompts

Next we need to run through our lists of characters, events, or locations, and ask GPT-4 to write a visual description for each that we can submit to Midjourney.

For instance, here’s our prompt for writing the description of a key event:

@lmql.query(model="gpt-4")
def get_event_description(event, book, author):
    '''lmql
    """Here's an accurate and concise visual description of the scene "{event}" from {book} by {author} which can be used as instructions for a painter to paint it with a high level of accuracy and detail. Break down the instructions into meaningful headings (such as characters, their facial expressions, prominent landmarks, etc): [DESCRIPTION]"""
    '''

Now if we run get_event_description("The formation of the Fellowship of the Ring", "The Lord of the Rings") we get:

1. Characters:
   - Gandalf: Depict him as an old, wise wizard with a long white beard, wearing a grey cloak and hat, holding a staff.
   - Frodo: A small hobbit with curly hair, wearing a waistcoat and trousers, holding the One Ring.
   - Sam, Merry, and Pippin: Three hobbits, smaller than the human characters, dressed in typical hobbit attire.
   - Aragorn: Tall, rugged, with long hair and a beard, wearing a travel-worn outfit, holding a sword.
   - Legolas: An elf with long blonde hair, sharp features, dressed in green and brown, holding a bow and arrow.
   - Gimli: A dwarf, shorter than the others, with a thick beard, wearing armor and holding an axe.
   - Boromir: A man with short dark hair, wearing armor and a cloak, holding a shield and sword.

2. Facial Expressions:
   - Gandalf: A serious, determined expression.
   - Frodo: A look of fear and determination.
   - Sam, Merry, and Pippin: Expressions of worry but also bravery.
   - Aragorn: A stern, focused expression.
   - Legolas: A calm, composed expression.
   - Gimli: A fierce, ready-for-battle expression.
   - Boromir: A conflicted, yet resolved expression.

3. Prominent Landmarks:
   - The Council of Elrond: Depict a circular stone table surrounded by various characters, set in a beautiful garden filled with exotic plants and trees.
   - Rivendell: The elven city in the background, with its unique architecture, bridges, and waterfalls.

4. Atmosphere:
   - The scene should be set in daylight, but with a sense of impending darkness.
   - The characters should be positioned as if they are ready to embark on a journey, facing towards the viewer.

5. Details:
   - The One Ring: Frodo should be holding the ring, which should be depicted as a simple gold band.
   - Weapons: Each character's weapon should be clearly visible and accurately represented.
   - Costumes: Each character's outfit should reflect their respective cultures and statuses as described in the book.

That’s a quite neat description and encompasses a lot of detail!

Similarly we will also implement get_character_description and get_place_description with slightly different prompts which you can see here.

Step 3. Visualising events and locations

We now have our prompts for a key location or a key event, and can start visualising them using Midjourney. We will reuse the same functions as we did in part 1, such as mj_imagine to make this happen.

If we run this on the prompt we generated for “The formation of the Fellowship of the Ring”, Midjourney will draw 4 options which can create variations of or upscale like before:

Visualising “The formation of the Fellowship of the Ring”

And voilà! Just like that we can visualise the key events and locations from our favourite novels.

As a reminder, I’ve wrapped all of the above functionality in a visual Streamlit app which you can get from here: https://github.com/parsaghaffari/Melquiades

Some final thoughts:

Midjourney’s creations are aesthetically stunning!
…But, if you zoom in on them, you will usually find funny errors :) E.g. the oversized ring in Gandalf’s hand…
The prompts we’ve used here can be further optimised. I’ve only taken a first stab at making this work end to end.
If there are recent and well known movies out there based on the books you’re visualising, there is a high chance that your renderings will look like them. I’m guessing this is because pictures from the actual movie are represented strongly in the training data that Midjourney is trained on (see examples below)

I’ve listed some of my favourite generations below:

A Song of Ice and Fire (Game of Thrones)

The Battle of Winterfell The Battle of the Blackwater Daenerys riding her dragons for the first time The Red Wedding The Night King’s attack on Hardhome The burning of King’s Landing The Battle of the Bastards

The Lord of the Rings

The crowning of Aragorn as King of Gondor The formation of the Fellowship of the Ring The Battle of Helm’s Deep The destruction of the One Ring at Mount Doom The final defeat of Sauron and the restoration of peace in Middle-earth

Dune

The Atreides family’s arrival on Arrakis Paul Atreides’ visions and training with the Bene Gesserit The betrayal and attack on House Atreides by the Harkonnens Paul and Jessica’s escape into the desert and their encounter with the Fremen Paul’s rise as a leader among the Fremen and his adoption of the Fremen culture

One Hundred Years of Solitude

Macondo Macondo The Buendia House The execution of Colonel Aureliano Buendía and the end of the civil war The arrival of the railroad and the ice factory

The Old Man and the Sea

Santiago’s 84-day streak without catching a fish Santiago’s return to shore with the marlin’s skeleton

To Kill a Mockingbird

Maycomb, Alabama Radley House Atticus defends Tom Robinson in court

Previous post Visualising novels using Midjourney v6 and GPT-4 - Part 1: Characters Update 15/01/2024: Read part 2 on visualising key locations and events here Since finishing the last Harry Potter book in 2007, I hadn’t read much Next post Versioning roles in startups It is widely accepted that the team that starts a company is rarely the same team that scales it to a large or public entity. Yet in practice this