Clojure Plays Mario
blog.phronemophobic.comY'know, the thing I least like about these AI video game players is how unlike humans they look. I was wondering about the difference, and I think it comes down to two parts. First and foremost, human players generally prefer routes with a lot of tolerance for input error. Second, humans take frequently "mental planning breaks," stopping for a moment in safe spots before challenging areas.
I think you could juggle the heuristics to demonstrate the preference for input error. For ML training, you could just random vary input timing by up to 20ms or so to teach the algorithm to favor safer moves. For path finding, it's trickier, but there's probably a way to favor "wide" paths. I'm less sure how to express the second concept, pausing briefly in "safe areas," but I imagine it's maybe noticing a place where significant amounts of entering no inputs does not affect the results.
What you’re basically describing is bounded rationality, which has been widely studied in behavioral economics, psychology, and engineering applications (Simon and Gigerenzer are two big names to google). A common framework for formalizing it is as what boils down versions of rate-distortion problems from information theory (very related to Bayesian statistics).
The reason it’s of engineering interest is, like you observe, bounded-rationality gives you solutions that are sub-optimal but more robust and often simpler.
Moreover, finding wide path solutions emerges naturally from sampling-based motion planners. These planners are asymptotically optimal, but if you terminate them early, they are more likely to give you a solution that goes through large gaps, not smaller ones, because it’s unlikely to sample a trajectory that goes through a tight space without heavy sampling. You could probably formulate that in the rate-distortion framework but I haven’t thought about how to do it precisely.
Oh cool! It's always hugely useful to learn the word for the thing you're thinking about. It can be really tricky to figure out if a vague idea has a name unless you're already pretty well read in a field. Now I have some reading to do, thanks!
This is actually a big issue with academic research related to bounded rationality. Although you could model it mathematically in another way, by far the most common is to use the rate-distortion approach. Rate distortion theory basically boils down to analyzing optimization problems of the form “minimize cost + (information-theoretic) entropy”. Problems of that form arise and are used for different reasons in fields including, e.g.: statistical mechanics, Bayesian statistics, anything in machine learning using softmax, large deviation theory, differential privacy, and, of course, bounded rationality and information theory.
However, since all these fields refer the same thing by different names, tools for handling problems in one field don’t get picked up by people working in another field. Either someone else rediscovers it later or someone has to have knowledge of multiple fields and see a connection. Sometimes the analysis done by one field isn’t useful in another due to different assumptions and research concerns, but that’s not obvious because you have to peel back a lot of layers of domain-specific jargon when reading the paper. Even though the math is very similar, reading a statistical mechanics paper written by a physicist is a real pain if you’re coming from an applied math / CS background, for example, because fields have their own notational conventions and refer to application scenarios that are meaningless to you and you need to figure out if that thing they reference is important to their development or not in the abstract.
It’s almost like reading House of Leaves. Here’s 30 pages with weird fonts describing the use of light in a non-existent movie and comparing it to both real movies and fake movies real people were supposedly involved in. Will it be relevant to the plot and thus require careful reading or can I skim this section? Maybe, but you won’t know unless you keep reading.
This is where ChatGPT usually shines. I gave it your comment and asked it to find the concept. I had to nudge it in the right direction though: https://chat.openai.com/share/f8855a35-7076-43e6-bf9e-19de8b...
> game players is how unlike humans they look.
I think you should watch some speed runners. They also don't look human, since they have some form of optimization in mind, compared to a casual player.
But, you will see that in most games RTA (Real Time Attack, speed running by humans, live) does choose different strategies than TAS (Tool Assisted Speedrun, still humans, but using tools to record and splice together a sequence of inputs for the game).
For a TAS you can justify taking fifty one-in-ten chances in a row, because every time it doesn't come off you just throw that away and re-record, so maybe you do a few hundred re-records for that section, not bad at all. In RTA that's never going to make any sense, it kills essentially 100% of runs.
> In RTA that's never going to make any sense, it kills essentially 100% of runs.
it depends on how much you want that world record.
It really doesn't. 10 to the 50 is an unimaginably huge number. If you could take this chance, once per second, for your whole lifetime, you've essentially no meaningful chance to succeed - you should do something else.
The tolerance for error is a huge thing! I also see (or intuit) people making tradeoffs between fault-tolerance and speed, ex. taking a tight curve on a regular block and a wide one on a piranhna plant.
That's more interesting and reassuring to watch. I think it's because the player's mind comes across in the playstyle. It's almost as if their entire history with the game is revealed.
I guess what's tangentially related is the tendency of AlphaGo (and friends) to favor an extremely likely 1 point win over a still likely (but not as much) 10 point win, making the endgame seem very non-human like. They still absolutely dominate human players but not with the point difference you'd expect a much stronger player to have.
I think the second issue would be pretty hard to solve without doing something artificial like slowing down the computer a ton, and then trying to come up with an algorithm to, like, delay for processing time.
It is just the nature of computers that they do simple things very fast. Humans do complex things very slowly (but can actually do them). This is why we are friends, we complement each other.
Although I do wonder, if the paths were not made to such tight tolerances (using your input delay solution), maybe AI Mario would spend a little longer lingering in areas just to let things align nicely for less-tight jumps.
It's really not a time issue. The case is if the AI can pause mid game in a "safe area" to deduce an optimal path not just greedy/lookback
For resting, you can build on your first idea: the random error increases with uninterrupted play time, and resting quickly lowers it.
I don't think making humans comfortable is the goal with respect to AI. The goal is to actually solve a problem. Performance is second. Human comfort is a distant third or beyond.
When AI can reliably solve a problem without significant negative consequenses from time to time, it's a win. How humans feel about the method is effectively irrelevant.
The near miss behavior is very much like overfitting. Mario is simple and deterministic enough that it doesn’t matter, but think about a scenario like a self driving car. A calculated near miss turns into a crash if the other car’s driver is just a little slower or their tires a little slicker than anticipated.
> I don't think making humans comfortable is the goal with respect to AI
According to whom?
AI in games, has historically been all about human comfort/enjoyment. Extremely good AI that seems "unnatural" to humans is usually not the goal.
You seem to talk about AI-controlled-NPCs in games, while GP starts from the article context about AI-controlled-PCs (player characters) and proceeds to generalize about using AI to solve problems outside games.
I think the point is that different people have different goals for "AI"
I just couldn't bear all that walking and jumping into near misses in the video.
The AI doesn't care that it's one pixel away from death.
That said, I could see some highly-skilled players (like those who do speedruns) showing off their precision and adopting a similar "scare the audience" style for a new genre of competition.
Also just chilling with the Bullet Bill a few pixels away for half the run!
This is really cool. I hadn't thought of a path-finding algorithm working on something like a Mario run. Nice to see AI approaches like this aren't getting forgotten for all the neural nets and machine learning.
Nice to see this! Path finding in Mario works very well! I implemented a bunch of classical methods including an A* based implementation in java back in college to play Mario for coursework and honestly a whole bunch of methods people may not think work well for something like this absolutely do.
https://github.com/iantbutler01/MarioAIImplementation
Notably we were the only group to choose Mario over Pacman and the framework the professor had us use was broken so on top of the algo implementations I also rewrote the forward model and more!
I’m guessing because Mario Bros 1 (among others) is deterministic? Ie same reason it can be speedrun so well? Monsters always in exactly the same spot if your previous frames inputs were the same.
Huh, nice. I got a little anxiety watching 6-3 of going the low path. That's not a human route, but then watching 8-2 there were a few nice bits that a skilled player would make.
I get that its a lot of trial and error, and its a nice attempt. What would it take to beat the maze levels, though? I guess since the reward is always just moving forward, its hard to tell when you're progressing without an internal map telling you that it's changed and that you're expecting that?
SMB1 i think is great to train on stuff like this, but it does have some secrets and some hiccups. Like, what would it take to get this AI to find the warp zones?
The game state and the heuristic need some extra information, since (spoiler?) Mario leaves the screen during the lead-up to the warp zones, and it doesn't look like it can account for that sort of positioning.
emulator integration is https://www.libretro.com/index.php/api/?amp=1 (warning, f’ed website) and the clojure wrapper (by OP) is: https://github.com/phronmophobic/clj-libretro/blob/67f186e87...
really cool!
That's really cool. Does that mean libretro has gained debugging inferfaces?
One thing I've always wondered about these automated game-playing applications is what they use as input.
Are they just getting images (frame by frame) as input, perceiving them in some way, and producing an output stream of controller button-pushes?
Or does it only detect "death" and start over, systematically changing it's robotically timed output stream in response with the goal of getting further along in the game?
I know the retro games were very deterministic. Pacman had "patterns" you could use. Was Mario like this? No random adversaries at all?
I have seen systems that use frames/screen captures as input (look up sentdex' GTA series on YouTube for an example) but that isn't what's happening here. In this case, it looks like the `dist` function is directly accessing key game values by reading bytes out of the emulator's memory. So the AI is taking multiple signals from the game's state as input, not just "alive/dead."
The blog post explains that part a bit under the "Distance" section.
They're reading the game's state from memory. This is also commonly done in tool assisted speedruns.
> Was Mario like this? No random adversaries at all?
Yeah. Read all about it here:
https://tasvideos.org/GameResources/NES/SuperMarioBros
https://tasvideos.org/GameResources/CommonTricks#LuckManipul...
https://tasvideos.org/GameResources/CommonTricks#ExamineGame...
Can anyone help me understand this: How does the AI "see" what's happening on the screen? Is there some sort of object/scene recognition going on?
It looks like the `dist` function is directly accessing key game values by reading bytes out of the emulator's memory in order to "see" the current game state. This is explained a bit under the "Distance" section.
Quick scan of the code (https://github.com/phronmophobic/clj-libretro/blob/67f186e87...) seems to indicate that there is at least some screenshotting going on, + serialization of game state. What is used for detecting thing, I couldn't find from just skimming the code.
I think the screenshots are just for assembling the time lapse video.
Check out `(defn dist` or read the explanation of that function in the blog post. It can't see enemies or pickups at all. It can only tell how far away it is from the end of the level, and if it's dead or not, then brute forces every button combination for every frame. The "in progress" video shows that it spends most of its time falling into pits, but eventually makes it to the end.
A fun metric would be how many deaths it takes to complete a given level. It's got to be in the tens of thousands.
Now just imagine an AI that could help me, a blind person, play Mario, or Zelda, and such.
Here's a writing tip: delete that first sentence.
I can never look at Mario the same
Pretty interesting. Human players are grinding Mario to squeeze some milliseconds in speed runs, while the AIs still aren’t close to that, to the point that we are happy that at least AI is beating a level