How well can Gemini 3 make a Henry James simulator?

10 min read Original article ↗

This quarter I have been teaching two large-ish history classes at UCSC and beginning a new book project, Ghosts of the Machine Age, about the psychology pioneer William James, his siblings Alice and Henry, and the late Victorian “prehistory of AI.”

Alongside this, I’ve been working on an NEH-funded project about the humanities and generative AI that largely grew out of reading and writing I’ve done for this newsletter (like this and this). How can we use machine intelligence in a way that doesn’t feel like it is damaging our minds and souls? How can educators adapt to and acknowledge the new reality but, at the same time, reject the idea that the only path forward is mass adoption of pre-packaged subscription services and for-profit “AI tutors”? How do we do something creative, interesting, and original with these tools rather than allow ourselves to be funneled toward the average, the expected, or the addictive — in short, toward slop?

Share

Ethan Mollick has written about the necessity of having personal benchmarks for determining how AI models actually stack up. (After all, most of the interesting personal use cases for an LLM are very different from answering questions on the International Math Olympiad). For this purpose, it helps to have something idiosyncratic and repeatable. Simon Willison asks every new LLM to create a picture of a pelican riding a bicycle using SVG shapes. Mollick asks for images and games related to otters, as well as T.S. Eliot style poetry.

As for me, I have been intrigued for several years now by the world-building and roleplaying tendencies of LLMs. These are, after all, hallucination engines. They can be induced to hallucinate less — and the latest crop of them, such as Google’s Gemini 3, released today, are certainly far more accurate than the state of the art even a year ago — but it often seems to me that their accuracy is, itself, a form of roleplaying.

“You are a helpful assistant.” “Claude cares about people’s wellbeing.”

These are not innate traits of these models. They are hard-coded directives telling the model what persona they are supposed to evoke in a given interaction (recall the demented Claude model that was tweaked to be obsessed with the Golden Gate Bridge — twiddle the knobs a bit, and they get weird very quickly).

The big AI labs seem to think that the majority of people want AI that responds as a helpful assistant or — more troublingly — as a sycophantic pal. They are probably right. But from the perspective of research and teaching, I think that LLMs’ willingness to role-play and hallucinate in a structured way is probably the most interesting thing about them. We might be using these things to format spreadsheets or spam sales emails. But they are also ghost-like, faux consciousnesses built on a vast archive of human history, flitting between timelines, languages, and personae without ever inhabiting any of them, and doing so in a way that even their creators don’t quite understand.

I, for one, find that interesting!

As Andrej Karpathy wrote not too long ago:

Today’s frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity’s documents with some sprinkle on top.

To that end: what better ghost to summon than Henry James, author of more than one ghost story, popularizer (along with his brother William) of the concept of the stream of consciousness, and someone who would almost certainly have absolutely loathed LLMs with every fiber of his being?

The rest of this post details the playable results of four attempts by the three current leading models (GPT-5.1, Claude Sonnet 4.5, and Gemini 3) to create a playable Henry James simulator.1 My initial prompt was, in effect, to make a full featured RPG game where you play as Henry James wandering as a flâneur at the 1889 Universal Exposition in Paris (the world’s fair that the Eiffel Tower was built for, and which Henry did in fact visit along with his brother).

Note that you can actually play these games: I am hosting them on Github and have deployed them on Vercel. “I” made them all today (or rather, the LLM did, with about 10-20 pieces of input each from me when something broke or seemed wrong).

Google’s Gemini 3 model is pretty clearly now at the head of the pack in terms of the public benchmarks, and in testing it also seems to have shown extraordinarily good abilities in paleography and other skills useful to historical research — on that front, check out my historian colleague Mark Humphries’ post on how Gemini 3 did with his own idiosyncratic personal benchmark:

At Google’s AI Studio you can build apps or games of your own using Gemini 3 for free, using natural language prompts like the one at the bottom of this post. That was how I made the two examples below.

Gemini 3 made two attempts. The first, which it titled The Jamesian Turn, can be played here.

I had asked for “a ‘combat’ system which is like if pokemon was a Belle Epoque battle of wits based on literary allusions, jibes, snide jokes, gossip, inuendo and rumor.” Gemini’s attempt left something to be desired, but I did find it fun to re-enact the famously tense relationship between the real life Oscar Wilde and Henry James (who could, conceivably, have bumped into one another and had an awkward conversation at the 1889 Fair).

I also asked the LLM to create a “rogue-like” game, which means procedurally generated maps. In this, it did what I thought was an admirable job. You can move through various environments evocative of 1880s Paris, such as this “private salon” which has quite a few gentlemen in evening dress aimlessly milling around… as one does in a private salon at the 1889 World’s Fair, I suppose.

However, the LLM didn’t do a great job actually evoking the mental life and tone of the real life Henry James, who is easy to parody, but was in reality a deeply complex, humane and interesting person with a great deal of wit. Gemini tended toward the most expected, surface-level evocation of Henry’s inner life, with this sort of thing being typical:

You find your mind alight with the subtle currents of human interaction here. Each whispered remark, each significant glance, forms a rich tapestry of observation. What narratives unfold behind these polite smiles? You ponder the nuances of this distinctly Parisian social scene.

I also set up an image generation system for displaying visuals of what Henry has in his personal possession and what he finds at the Fair.

Henry receives a personal invitation from the “Ambassador of the United Bingdom” to celebrate the “Birthday Birthday” of the Queen.

This kind of thing is currently buggy and costly. But I suspect that as costs go down and speed and accuracy increase, there will be interesting potential here.

It goes without saying that in a production version of a game/app like this, any AI generated content would need to be clearly labelled as such. Features like the “viewable inventory” above raise interesting possibilities for student learning and cultivating a sense of historical intuition. For instance, you could ask students playing a game like this to itemize why this document is clearly fake, and research real-life primary sources that resemble it.

A side note: if you would like to help cover my hosting and API costs for these games, please sign up for a paid subscription:

The second Gemini 3 effort, The Ambassadors, is available to play here.

The splash screen, seemingly designed to look like a 19th century calling card. Not bad!

After seeing how lost it got with the first attempt, I gave it a little bit more guidance and suggested an opening scene with Henry talking to his brother about their plans for the day. I also asked for portraits rendered using ASCII characters (which LLMs tend to have a lot of trouble figuring out) and a complete inventory system so Henry can pick up objects and even give them to other people. This felt to me like the most robust effort and certainly the most beautifully-done user interface of the bunch.

Gemini shows itself here to be genuinely good at thinking through the steps required to execute on an idea and make them actually happen. The trees and grounds of the Palais du Trocadéro here were simply elicited by me saying “use SVG to make procedurally generated maps that accurately evoke the area of and around the 1889 World’s Fair. Research online to find more information. Make the tiles beautiful.”

Less successful was the LLM’s attempt to evoke the at-times-combative “battle of wits” atmosphere of Henry James’ world with a card game style conversational system that includes things like “Latin quip”!

By contrast to the relatively functional offerings from Gemini 3, I could not get Claude Sonnet 4.5 (using Claude Code’s new web ui) to congeal into anything usable.

You can give it a shot here, but the result is a mess, aside from the lovely splash screen which opens this post:

Nothing works, there is no interactivity, no sense of human-like organization or design sensibility. I have been successfully using Claude Code on similar projects for the past few months, and in works great. But this was a reminder that the human guide ropes around Claude Code matter. My previous work was with careful prompting and involved actually reviewing and testing each step of what it produced. For the experiment today, I tried full vibe coding without any oversight besides a handful of prompts to steer it or say “continue on with your plan,” and the result was pretty much useless.

So, too, was GPT-5.1’s effort. This surprised me, as the model has proven genuinely smart and capable in testing I have done with debugging and reviewing code for another educational history simulation I am working on, and for a related website.

It is time for me to pick my daughter up from her preschool, so I will stop there. I promise I have much more to say about history that has absolutely nothing to do with AI — but for now, I want to leave you with a challenge. If you have never made anything with code, you are the same as me a year ago. I invite you to try! It’s an interesting mental exercise to try doing it now, when even expert coders are effectively using natural language just like humanists, and the tools available are the same for everyone.

In short, this might be a good time to find your own pelican riding a bicycle or Henry James RPG. If you have no idea where to start, you can literally copy and paste this post into an LLM of your choice, then write “walk me through how to do this” and go from there. It’s more or less what I did.

Thank you for reading, and I’d love to see what people come up with in the comments.

If you’ve made it this far, consider signing up for a paid subscription. This support makes Res Obscura possible.

Leave a comment