A verified algorithm for determining the intersection point of two lists in O(1) space

Okay, this post has been a long time coming. Initially I meant it to cover a generalisation of the the algorithm we developed last time. The plan was to do a hand wavy proof of correctness, and take hints from the proof process to come up with a more generic version of the algorithm.

That’s still the plan, but I wasn’t happy with the hand wavy part. So I decided to do a proper proof. But if something’s worth doing it’s definitely worth overdoing. Right? And that’s how I ended up writing a verified version of the algorithm in Dafny.

On to the proofy bit!

A short aside on mutability

The algorithm works on singly linked lists implemented using C++ pointers. Working with mutable, dynamically allocated data structures complicates the proof significantly. Since I don’t want to go down the Separation Logic rabbit hole we’ll make the following simplification: we treat the linked lists as finite sequences.

It turns out Dafny uses a similar approach with an extra twist to ensure the lists aren’t circular.

The linked list data structure

The Node class is what you’d expect–it has a value and a next pointer–with added machinery to aid proofs and a convenience method for constructing lists which I’ve omitted. I’ll get to them shortly.

class Node<T(0)>
{
  var value: T;
  var next: Node?<T>;

  // Proof aides

  static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
    // Pre- and postcondition
  {
    // Implementation
  }
}

If you’re wondering about the (0) bit after the type parameter, that tells Dafny that whatever type we supply it must be “default constructible” in C++ parlance.

We’ll get to the implementation of Cons in a bit, but first let’s look at how Dafny deals with mutable linked lists. Since linked lists are a recursive data structures our algorithms will have to use either loops or recursion. In either case Dafny will have trouble proving that the process terminates because we might be dealing with a circular list. The basic approach to solving this difficulty is to keep track of what nodes go where, and prove that a node can’t appear in the list starting at its next pointer. The idiom in Dafny seems to be to define a predicate called Valid together with some ghost fields to track the proof state for the linked list. Unlike methods and regular fields, predicates and ghost fields are only used for verification¹.

  // Proof aides

  ghost var Elems: seq<T>;
  ghost var Repr: set<Node<T>>;

  predicate Valid()
    reads this, Repr
  {
    this in Repr &&
    |Elems| > 0 && Elems[0] == value &&
    (next == null ==> |Elems| == 1) &&
    (next != null ==>
      next in Repr && next.Repr <= Repr && this !in next.Repr &&
      next.Valid() && next.Elems == Elems[1..])
  }

Elems tracks the elements of the list. Repr is more interesting, and it’s the key ingredient for allowing us to work with linked lists. It’s a set tracking the nodes in the representation of the list². Valid uses it to check that a node doesn’t appear in the list that starts from it, that is, that it’s not circular. It turns out this is enough to allow Dafny to work out termination for loops involving linked lists made of Nodes.

Now we can finally look at Cons which is pretty much what you’d expect plus some proof state bookkeeping.

  static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
    requires xs == null || xs.Valid()
    ensures n.Valid()
    ensures if xs == null then n.Elems == [x] else n.Elems == [x] + xs.Elems
  {
    n := new Node;
    n.value, n.next := x, xs;
    if xs == null
    {
      n.Elems := [x];
      n.Repr := {n};
    } else {
      n.Elems := [x] + xs.Elems;
      n.Repr := {n} + xs.Repr;
    }
  }

The precondition–introduced by requires–asks for a valid node, and the postconditions–introduced by ensures–make sure we leave the node in a valid state and that we’ve actually prepended x to xs. The actual logic is the first two lines. The if keeps the proof state valid.

We’re now have all the ingredients to write and prove the intersection algorithm. The algorithm is similar to the one last time, but I’ve had to make a few adjustments to make Dafny happy.

Let’s start by looking at the pre- and post conditions.

method Intersects<T(0)>(a: Node?<T>, b: Node?<T>) returns (r: Node?<T>)
  requires a == null || a.Valid()
  requires b == null || b.Valid()
  ensures a == null || b == null ==> r == null
  ensures r == null || (r in ListRepr(a) && r in ListRepr(b))
{
  // "Synchronise" lists heads

  // Find the intersection point
}

The preconditions are fairly light, we want the lists to be valid. As mentioned above, this is key allowing Dafny to derive termination proofs for the inner loops.

By the way, the ? after Node in the type signatures means it’s a “nullable” type. That means we have to deal with null all over the place, but it models the original algorithm better.

The postconditions say if either of the inputs is null so will be the reult–fair enough, if any or both lists are empty they have no intersection point–and that if the result is not null–we found an intersection point–it’s present in both lists.

Now these may sound like reasonable postconditions, but the last one is not as strong as it cold be. It would be better if we made sure the intersection point is the first common node between the two instead of a common node. It looks like it would take a bit more work to convince Dafny to verify that postcondition so I decided to publish the proof as it is. I might revisit this at some point in the future.

OK, it’s time to have a look at the first part of the algorithm where we “synchronise” the list heads so we can then walk them in lockstep.

  // "Synchronise" lists heads

  var m: nat := Length(a);
  var n: nat := Length(b);
  var pa, pb := a, b;
  
  if m > n
  {
    while m > n
      invariant pa != null ==> pa.Valid()
      invariant m == ElemCount(pa) && n == ElemCount(pb)
      invariant m >= n
      decreases ListRepr(pa)
    {
      pa := pa.next;
      m := m - 1;
    }

    assert m <= n;
    assert pa != null ==> pa in ListRepr(a);
    assert pb != null ==> pb in ListRepr(b);
  } else if n > m {
    while n > m
      invariant pb != null ==> pb.Valid()
      invariant m == ElemCount(pa) && n == ElemCount(pb)
      invariant n >= m
      decreases ListRepr(pb)
    {
      pb := pb.next;
      n := n - 1;
    }

    assert n <= m;
    assert pa != null ==> pa in ListRepr(a);
    assert pb != null ==> pb in ListRepr(b);
  }

We keep the lengths of the lists in m and n respectively, and use pa and pb to iterate over them. I won’t show the definition of Length here, but it’s what you’d expect.

I’ve had to add an explicit if around the two loops to convince Dafny that the loop invariants hold on entry in the loops. To be honest I think that’s reasonable, and it makes the algorithm easier to understand at the cost of an extra check.

The loop invariants make sure m and n stay in sync with the lengths of the lists as pa and pb. The important point here is that call to Valid together with the decreases ListRepr(...) clause allow Dafny to prove that the list makes progress at each step, and eventually terminates. I’m don’t know about you but I find this pretty cool.

The assertions are not strictly necessary, but I like to have sanity checks thrown in.

On to the main loop. Before we enter the loop we have m ≤ n ∧ n ≤ m . Therefore m = n . They’re also both in sync with length of the lists starting in pa and pb.

  // Find the intersection point

  assert m == ElemCount(pa) && n == ElemCount(pb) && m == n;
  assert pa != null ==> pa in ListRepr(a);
  assert pb != null ==> pb in ListRepr(b);

  while pa != pb
    invariant pa != null ==> pa.Valid()
    invariant pb != null ==> pb.Valid()
    invariant m == ElemCount(pa) && n == ElemCount(pb) && m == n
    decreases ListRepr(pa), ListRepr(pb)
  {
    pa := pa.next;
    pb := pb.next;
    m := m - 1;
    n := n - 1;
  }

  assert pa == pb;

  r := pa;

The loop invariants make sure that m and n stay in sync with the lists as well as with each other. This is why we don’t have to explicitly check pa and pb for null in the loop test.

And with that we’re done. I’m still not 100% happy with the postcondition on the algorithm. I think I would come up with a stronger postcondition if I kept track of the pointers as sequences instead of sets, like in the case of Elems, and check that there’s no earlier node that the sequences have in common before the result. Anyway, this post is long enough as it is.

I’ve been meaning to play with theorem proving for a while, and this was a nice excuse to do so. It was a mostly enjoyable exercise, and Dafny seems like a nice language to start exploring the field.

In logic expressions the operators have the same meaning as in C. |...| stands for the length of a collection of things, and ==> for logical implication.↩︎
Actually each node has its own set that is a subset of the one in the node pointing to it. Kind of like Russian nesting dolls.↩︎