GitHub - susam/mvs: A minimum viable Markov gibberish generator in 32 lines of Python, inspired by the legendary Mark V. Shaney program of 1980s

Mark V. Shaney Junior Gibberish Generator

Mark V. Shaney Junior is a minimal implementation of a Markov gibberish generator inspired by the legendary Mark V. Shaney program from the 1980s. Mark V. Shaney was a synthetic Usenet user that posted various messages to the newsgroups using text generated by a Markov chain program. See the Wikipedia article Mark V. Shaney for more details about it.

The program mvs available in this project consumes text via standard input, builds an internal Markov model and then uses the model to generate gibberish.

View Source

Source Code

Here is the complete source code of the gibberish generator (mvs):

#!/usr/bin/env python3

import random
import sys

type Key = tuple[str, ...]
type Model = dict[Key, list[str]]


def train(text: str, n: int) -> Model:
    words = text.split()
    model: Model = {}
    for i in range(len(words) - n):
        key = tuple(words[i : i + n])
        value = words[i + n]
        model.setdefault(key, []).append(value)
    return model


def generate(model: Model, length: int, prompt: Key) -> str:
    key = prompt if prompt else random.choice(list(model.keys()))
    output = list(key)
    for _ in range(length - len(key)):
        values = model.get(key)
        if not values:
            break
        next_word = random.choice(values)
        output.append(next_word)
        key = *key[1:], next_word
    return " ".join(output)


def main(n: int, length: int, prompt: Key) -> None:
    model = train(sys.stdin.read(), n)
    print(generate(model, length, prompt[:n]))


if __name__ == "__main__":
    main(
        int(sys.argv[1]) if len(sys.argv) > 1 else 2,
        int(sys.argv[2]) if len(sys.argv) > 2 else 100,
        tuple(sys.argv[3].split()) if len(sys.argv) > 3 else (),
    )

This program implements a simple Markov text generator. It reads text from standard input and records every sequence of n consecutive words together with the words that follow them. From this data it learns which words tend to come next after a given word sequence, with more frequent sequences being more likely to be chosen while generating text.

To generate text, the program starts from a random sequence or a user provided prompt. It repeatedly selects a possible following word at random, weighted by how often it appeared in the original text. The result mimics the local patterns of the source material while drifting into grammatically plausible but often meaningless gibberish.

Commentary

This implementation is deliberately simple and inefficient. It keeps all observed word sequences and their followers in memory, including duplicates, which makes the model needlessly large. There is plenty of scope for improvement, such as storing follower-word frequencies instead of keeping duplicate entries or pruning rarely seen sequences.

More sophisticated techniques can be applied to produce more plausible sounding text, including higher order n-grams, careful handling of sentence boundaries and punctuation, addition of grammatical constraints or probabilistic language models. What is provided here is intended to serve as a minimum viable Markov text generator. Any further enhancements are left as an exercise for the reader.

While this program is small, it is worth noting that this program is not intended to be an exercise in code-golfing. Clarity takes precedence over reducing the number of lines.

Given the overwhelming popularity of large language models (LLMs) in 2025, it is worth noting that this approach bears little resemblance to LLMs. LLMs are trained on vast datasets using neural networks to model language patterns across large spans of text. LLMs capture global structure and long range dependencies. By contrast, Markov text generators rely entirely on local word transition statistics and have no model of global structure. Despite these limitations, the Markov text generator shared in this project can serve as a simple introduction to statistical language modelling. After all, Markov chains can be thought of as the 'hello, world' of language models.

Get Started

To get started with the mvs program in this project, clone or download the repository to a system with Python 3 installed. Then run the following command:

On most Unix or Linux systems, you can alternatively run:

This generates arbitrary gibberish based on the model it has built by consuming the text in the file book.txt.

Command Line Arguments

To keep this tool as minimal as possible, it does not come with any command line options. In fact, it does not even have the --help option. However, it supports a few command line arguments. Since there is no help output from the tool, this section describes the command line arguments for this tool.

Here is a synopsis of the command line arguments supported by this tool:

./mvs [N [LENGTH [PROMPT]]]

Here is a description of each argument:

N

The order of the Markov model. This value specifies how many consecutive words are used as the state when training the model. For example, a value of 2 means the model uses two previous words to predict the next one, which corresponds to a trigram model in standard n-gram terminology. A value of 3 would use three previous words (a 4-gram model) and so on. If not specified, this defaults to 2.
LENGTH

The maximum number of words to generate. Generation may stop earlier if the model reaches a state for which no continuation exists. If not specified, this defaults to 100.
PROMPT

An optional starting prompt used to seed text generation. This should be a single command line argument containing one or more words separated by spaces, so it must be quoted when invoking the program. If provided, only the first N words of the prompt are used. If omitted, generation starts from a random state in the model.

Here are some usage examples of these command line arguments:

Generate gibberish using a trigram model:
Generate gibberish up to 250 words long:
Use the words 'There is' to start the gibberish:
```
./mvs 2 100 'There is' < book.txt
```

Gibberish

Unprompted Gibberish

Here is an example of a gibberish produced by the program when no prompt was supplied:

$ ./mvs < book.txt
Ghost again stood side by side in the stables; and the bedpost was his
own. The bed was warm, and tender; and the Ghost had entered. It was a
Turkey! He never could have listened to it can cheer and comfort you
in a voice that seldom rose above the warehouse door: Scrooge and
Marley. Sometimes people new to the postboy, who answered that a
bachelor was a genial shadowing forth of all her silken rustlings, and
her rapid flutterings past him, he seized the ruler with such severity
as the figure-head of an old gentleman in a little and

Prompted Gibberish

Here is an example of text generated from the initial prompt 'At last':

$ ./mvs 2 100 'At last' < book.txt
At last she said, amazed, "there is! Nothing is past hope, if such a
man whose name he had an expectation that the singer fled in terror,
leaving the keyhole to regale him with such favour, that he turned his
steps towards his door. "It's humbug still!" said Scrooge. "I am very
happy," said little Bob, the father, who came upon his knee; for in
the fire, and deep red curtains, ready to our calling, we're well
matched. Come into the most extravagant contortions: Scrooge's niece,
indignantly. Bless those women; they never do anything by halves. They
are all indescribable

Personal Gibberish

Finally, I also ran the program on all posts I have written so far on my website at https://susam.net/. Here is what it generated:

$ make filter-website && ./mvs < susam.txt
while a query replace operation is approved by the user. The above
variable defines the build job. It can be incredibly useful while
working on assembly language and machine code. In fact, all internal
resources like the result to refine the search prompt changes from
bck-i-search: to fwd-i-search:. Now type C-SPC (i.e. ctrl+space) to
set a mark causes Emacs to use 32-bit registers like EBP, ESP,
etc. Thus the behaviour is undefined. Such code may behave differently
when compiled with the readily available GNU tools like the shape
of 8. Flipping "P" horizontally makes it a proper quine: cat $0

Apparently, this is what I would sound like if I ever took up speaking gibberish!

Licence

This is free and open source software. You can use, copy, modify, merge, publish, distribute, sublicence and/or sell copies of it, under the terms of the MIT Licence. See LICENSE.md for details.

This software is provided "AS IS", WITHOUT WARRANTY OF ANY KIND, express or implied. See LICENSE.md for details.