A tiny probabilistic programming language in Gleam

12 November 2025

<tl;dr> I’ve built the package tinypp for the Gleam programming language that lets you write simple probabilistic programs. </tl;dr>

I like when programming languages bring something really new to the table. Not just another combination of parantheses and braces but something that enables a whole new use case. That’s why I like the language Gleam and its use syntax.

Here is how it works: Suppose we have some higher order function

fn foo(x: Int, f: fn(Int) -> Int) -> Int {
  x + f(x)
}

that we want to call like this

foo(5, fn(z) {
  2 * z
})

then we get the same effect by writing

use z <- foo(5)
2 * z

This might look silly at first but its great power is the ability to avoid nested functions:

foo(5, fn(z) {
  foo(z, fn(w) {
    w + 1
  })
})

becomes

use z <- foo(5)
use w <- foo(z)
w + 1

In this article, I would like to present how you can build a tiny probabilistic programming language out of this. As a teaser, we will be able to write:

let die = uniform([1, 2, 3, 4, 5, 6])
use first <- sample(die)
use second <- sample(die)
use <- condition(first > second)
query(first == 2)

to find the probability that a die shows a two if we already know that its value is greater than that of another die.

But let’s start gently.

What is probabilistic programming?

Well, it’s programming, but probabilistic. 🤷

More seriously, I would describe it as using programming to define a probability distribution. It usually works by adding two key ingredients to “conventional” programming:

Besides assigning a fixed value to a variable, you can also declare that a variable is supposed to follow a certain probability distribution (sampling).
You can state that a certain fact should hold, “confining” the resulting distribution (conditioning).

It is a powerful tool to build statistical models in a very natural way.

There are different ways to implement a probabilistic programming language (PPL) and I am in no way in expert in this. In fact, all I did was read this fun paper about probabilistic programs that learn themselves and take a look at the PPL Church that was used there. I then did my best to convert the ideas there from complicated Scheme to a bit less complicated Gleam.

How to sample…

How can we express “in this piece of code, please assume that x has been sampled from distribution”? Well, this smells like we should use a callback function. That is, we tell the user “Please put your piece of code that works with x in a function accepting x as an argument. I will figure out how to create this funny x you want and will give it to you.”

Together with the fact that sampling should depend on some distribution, this already roughly gives us the function signature of sample:

fn sample(
  distribution: Distribution(a),
  f: fn(a) -> ???,
) -> ???

Here, a is a generic type parameter. If distribution is a distribution over a then sampling from it will give us a value of type a and we want to plug this into the callback f, so f needs to accept an a. So far, so good.

Two questions remain: What should we put in place of the ??? and how can we fill sample with life? For the first question, remember that our aim is to program a distribution. It will be produced inside f, say of type Distribution(b) with another generic type parameter b. So that is what f will have as a return type. The fact that we are sampling over something should not change that, in the end, we want a Distribution(b), so this is also the return type of sample:

fn sample(
  distribution: Distribution(a),
  f: fn(a) -> Distribution(b),
) -> Distribution(b)

Signature done, but the harder task is the function body of sample. Think about it like this: For any sampled value x, f(x) gives us some distribution where different values of type b have different probabilities. But x itself also has a certain probability, defined by distribution. So all the probabilities in f(x) must be scaled by the probability of x. And this is true for all possible values of x.

Mathematically, you can think of it as creating a mixture distribution where you have one mixture component for every element in the support of distribution (i.e. all the values with nonzero probability) that is weighted by the probability of that element.

So, if we assume that we can get hold of a list of values and their probabilities for a distribution (tinypp.to_list) and also have a function that takes a list of weights and a list of distributions to make a mixture distribution out of them (tinypp.mixture), we can simply write:

fn sample(
  distribution: Distribution(a),
  f: fn(a) -> Distribution(b),
) -> Distribution(b) {
  let #(support, probabilities) = list.unzip(to_list(distribution))
  mixture(probabilities, list.map(support, f))
}

… and how to condition

Luckily, this is a bit simpler. We already figured out that callbacks are the way to express “in the following, assume that this holds”. So the user provides a function f and we make sure to only call it if a certain predicate holds. That’s just programming 101 – use an if! Ah, did I mention that Gleam is super into minimalism and there is no if? They figured that if you have case (like match or switch in other languages) you can get rid of a whole language construct.

So anyways, we can write:

pub fn condition(
  predicate: Bool,
  f: fn() -> Distribution(a),
) -> Distribution(a) {
  case predicate {
    True -> f()
    False -> ???
  }
}

What should happen if the predicate is not fulfilled, though? Just by looking at the types, we know that we must also produce a Distribution(a). In “normal” programming, we would just “do nothing” in the False/else branch. So what value of type Distribution(a) best resembles this nothing? It is a distribution with empty support, provided by tinypp.fail (of course this is not really a distribution in the mathematical sense because you could not normalize it to have its probabilities sum up to one):

pub fn condition(
  predicate: Bool,
  f: fn() -> Distribution(a),
) -> Distribution(a) {
  case predicate {
    True -> f()
    False -> fail()
  }
}

Think about happens if condition returns fail() and this is then used inside sample as a mixture component. It would not add anything to the mixture and thus have no effect. Exactly what we wanted!

Finishing touches

Both sample and condition produce a distribution only if you hand them a function that already itself produces a distribution. So how do we actually create one in the end?

Here, we have the opportunity to state what we want the distribution of. Say, we have sampled x and y. Maybe we want the distribution of x, the joint distribution of #(x, y), the distribution of 2 * x + 13, or something entirely different. For fixed values of these, the distribution is trivial: It has probability one for exactly the value of the expression and zero otherwise. tinypp provides singleton that produces a distribution exactly like that (you could also call this a discrete Dirac distribution).

Since the sampling and conditioning is only conceptual, we actually have fixed values inside the callbacks given to them. So singleton is exactly what we need here. To make the use a bit clearer, tinypp defines the alias query for singleton.

Putting it all together

It’s time to use the functions we developed above to do probabilistic programming! As alluded to in the beginning, let us find the probability that a die shows a two if we know that its value is greater than that of another die.

First, we need the distribution of a fair die. tinypp provides some rudimentary distributions in the distributions module, including uniform to create a uniform distribution over a given list.

let die = uniform([1, 2, 3, 4, 5, 6])

Next, we want to sample the first die…

let die = uniform([1, 2, 3, 4, 5, 6])
sample(die, fn(first) {
  // ...
})

… and the second die.

let die = uniform([1, 2, 3, 4, 5, 6])
sample(die, fn(first) {
  sample(die, fn(second) {
    // ...
  })
})

We know that the first die shows a greater value than the second one, so let us condition on that:

let die = uniform([1, 2, 3, 4, 5, 6])
sample(die, fn(first) {
  sample(die, fn(second) {
    condition(first > second, fn() {
      // ...
    })
  })
})

And we are interested in the probability that the first die shows a two, so that is our query:

let die = uniform([1, 2, 3, 4, 5, 6])
sample(die, fn(first) {
  sample(die, fn(second) {
    condition(first > second, fn() {
      query(first == 2)
    })
  })
})

This does what we want! It produces the correct distribution. But… it looks a bit teadious and convoluted. This is where we finally exploit the use-syntax:

let die = uniform([1, 2, 3, 4, 5, 6])
use first <- sample(die)
use second <- sample(die)
use <- condition(first > second)
query(first == 2)

Tadaa! A simple probabilistic program in Gleam.

Since I promised the probability and not only a distribution, let’s add that as well. tinypp provides the functions normalize and pmf that we can use for that. Normalization is required here because we want a concrete probability. Often, these actual numbers are irrelevant and you are only interested in, say, the most likely hypothesis. pmf stands for probability mass function and allows to ask a distribution what probability it assigns to a specific hypothesis.

So here is your probability:

let distribution = {
  let die = uniform([1, 2, 3, 4, 5, 6])
  use first <- sample(die)
  use second <- sample(die)
  use <- condition(first > second)
  query(first == 2)
}
pmf(normalize(distribution), True)) // -> 0.2666666666666667

So what?

As you can see in the examples folder, you can actually do some interesting probabilistic inference with this (including Bayesian linear regression!). On the other hand, the limitation of discrete distributions and the exhaustive marginalization performed by every call to sample makes tinypp practically unusable for “real world” inference problems.

Consider tinypp a fun demonstration, a learning opportunity to dive into probabilistic programming languages, and yet another surprising use case for Gleam’s (in my opinion) killer feature, the use-syntax.