Show HN: MegaHAL in Pure SQL
github.comI ported Jason Hutchens' 1998 Loebner Prize-winning chatbot, MegaHAL, to run entirely inside PostgreSQL, in pure SQL. The entire lifecycle -- tokenization, learning, keyword extraction, Markov chain generation, and entropy scoring -- is implemented in standard SQL using complex CTEs. There is no PL/pgSQL or any other sort of procedural escape hatch.
Learning is a single ~560-line SQL statement that splits text, interns symbols, and updates two 5th-order Markov tries (forward and backward) using depth-unrolled writable CTEs. Inference is a recursive query that generates N candidate replies in parallel. It performs bidirectional weighted random walks, evaluates them for information-theoretic surprise, and formats the winner as a sentence-cased string.
I provided a `docker-compose.yml` and convenient Python driver script so you can try it out quickly, and there's also a web-based demo where I bundled it with PGlite (WASM PostgreSQL) at https://tgies.github.io/megahal-sql/. These are provided for convenience, but you can also just run the schema initialization SQL and `SELECT megahal_converse('hello from hn.');` Nice job! I corresponded with Hutchens back in the day about MegaHAL. What made it stand out compared to other Loebner chatbots was that it didn’t just zoom in on a couple of keywords in the user’s input and then run a forward-only chain - instead buiding both forward and backward Markov models, generating text in both directions along with calculating entropy/surprise to produce a novel response.