Okay, this post has been a long time coming. Initially I meant it to cover a generalisation of the the algorithm we developed last time. The plan was to do a hand wavy proof of correctness, and take hints from the proof process to come up with a more generic version of the algorithm.
That’s still the plan, but I wasn’t happy with the hand wavy part. So I decided to do a proper proof. But if something’s worth doing it’s definitely worth overdoing. Right? And that’s how I ended up writing a verified version of the algorithm in Dafny.
On to the proofy bit!
A short aside on mutability
The algorithm works on singly linked lists implemented using C++ pointers. Working with mutable, dynamically allocated data structures complicates the proof significantly. Since I don’t want to go down the Separation Logic rabbit hole we’ll make the following simplification: we treat the linked lists as finite sequences.
It turns out Dafny uses a similar approach with an extra twist to ensure the lists aren’t circular.
The linked list data structure
The Node class is what you’d expect–it has a value and a
next pointer–with added machinery to aid proofs and a convenience method
for constructing lists which I’ve omitted. I’ll get to them shortly.
class Node<T(0)>
{
var value: T;
var next: Node?<T>;
// Proof aides
static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
// Pre- and postcondition
{
// Implementation
}
}
If you’re wondering about the (0) bit after the type
parameter, that tells Dafny that whatever type we supply it must be
“default constructible” in C++ parlance.
We’ll get to the implementation of Cons in a bit, but
first let’s look at how Dafny deals with mutable linked lists. Since
linked lists are a recursive data structures our algorithms will have to
use either loops or recursion. In either case Dafny will have trouble
proving that the process terminates because we might be dealing with a
circular list. The basic approach to solving this difficulty is to keep
track of what nodes go where, and prove that a node can’t appear in the
list starting at its next pointer. The idiom in Dafny seems
to be to define a predicate called Valid
together with some ghost fields to track the proof state for
the linked list. Unlike methods and regular fields, predicates and ghost
fields are only used for verification1.
// Proof aides
ghost var Elems: seq<T>;
ghost var Repr: set<Node<T>>;
predicate Valid()
reads this, Repr
{
this in Repr &&
|Elems| > 0 && Elems[0] == value &&
(next == null ==> |Elems| == 1) &&
(next != null ==>
next in Repr && next.Repr <= Repr && this !in next.Repr &&
next.Valid() && next.Elems == Elems[1..])
}
Elems tracks the elements of the list. Repr
is more interesting, and it’s the key ingredient for allowing us to work
with linked lists. It’s a set tracking the nodes in the representation
of the list2. Valid uses it to check
that a node doesn’t appear in the list that starts from it, that is,
that it’s not circular. It turns out this is enough to allow Dafny to
work out termination for loops involving linked lists made of
Nodes.
Now we can finally look at Cons which is pretty much
what you’d expect plus some proof state bookkeeping.
static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
requires xs == null || xs.Valid()
ensures n.Valid()
ensures if xs == null then n.Elems == [x] else n.Elems == [x] + xs.Elems
{
n := new Node;
n.value, n.next := x, xs;
if xs == null
{
n.Elems := [x];
n.Repr := {n};
} else {
n.Elems := [x] + xs.Elems;
n.Repr := {n} + xs.Repr;
}
}
The precondition–introduced by requires–asks for a valid
node, and the postconditions–introduced by ensures–make
sure we leave the node in a valid state and that we’ve actually
prepended x to xs. The actual logic is the
first two lines. The if keeps the proof state valid.
We’re now have all the ingredients to write and prove the intersection algorithm. The algorithm is similar to the one last time, but I’ve had to make a few adjustments to make Dafny happy.
Let’s start by looking at the pre- and post conditions.
method Intersects<T(0)>(a: Node?<T>, b: Node?<T>) returns (r: Node?<T>)
requires a == null || a.Valid()
requires b == null || b.Valid()
ensures a == null || b == null ==> r == null
ensures r == null || (r in ListRepr(a) && r in ListRepr(b))
{
// "Synchronise" lists heads
// Find the intersection point
}
The preconditions are fairly light, we want the lists to be valid. As mentioned above, this is key allowing Dafny to derive termination proofs for the inner loops.
By the way, the ? after Node in the type
signatures means it’s a “nullable” type. That means we have to deal with
null all over the place, but it models the original
algorithm better.
The postconditions say if either of the inputs is null
so will be the reult–fair enough, if any or both lists are empty they
have no intersection point–and that if the result is not
null–we found an intersection point–it’s present in both
lists.
Now these may sound like reasonable postconditions, but the last one is not as strong as it cold be. It would be better if we made sure the intersection point is the first common node between the two instead of a common node. It looks like it would take a bit more work to convince Dafny to verify that postcondition so I decided to publish the proof as it is. I might revisit this at some point in the future.
OK, it’s time to have a look at the first part of the algorithm where we “synchronise” the list heads so we can then walk them in lockstep.
// "Synchronise" lists heads
var m: nat := Length(a);
var n: nat := Length(b);
var pa, pb := a, b;
if m > n
{
while m > n
invariant pa != null ==> pa.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant m >= n
decreases ListRepr(pa)
{
pa := pa.next;
m := m - 1;
}
assert m <= n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
} else if n > m {
while n > m
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant n >= m
decreases ListRepr(pb)
{
pb := pb.next;
n := n - 1;
}
assert n <= m;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
}
We keep the lengths of the lists in m and n
respectively, and use pa and pb to iterate
over them. I won’t show the definition of Length here, but
it’s what you’d expect.
I’ve had to add an explicit if around the two loops to
convince Dafny that the loop invariants hold on entry in the loops. To
be honest I think that’s reasonable, and it makes the algorithm easier
to understand at the cost of an extra check.
The loop invariants make sure m and n stay
in sync with the lengths of the lists as pa and
pb. The important point here is that call to
Valid together with the
decreases ListRepr(...) clause allow Dafny to prove that
the list makes progress at each step, and eventually terminates. I’m
don’t know about you but I find this pretty cool.
The assertions are not strictly necessary, but I like to have sanity checks thrown in.
On to the main loop. Before we enter the loop we have m ≤ n ∧ n ≤ m
. Therefore m = n .
They’re also both in sync with length of the lists starting in
pa and pb.
// Find the intersection point
assert m == ElemCount(pa) && n == ElemCount(pb) && m == n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
while pa != pb
invariant pa != null ==> pa.Valid()
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb) && m == n
decreases ListRepr(pa), ListRepr(pb)
{
pa := pa.next;
pb := pb.next;
m := m - 1;
n := n - 1;
}
assert pa == pb;
r := pa;
The loop invariants make sure that m and n
stay in sync with the lists as well as with each other. This is why we
don’t have to explicitly check pa and pb for
null in the loop test.
And with that we’re done. I’m still not 100% happy with the
postcondition on the algorithm. I think I would come up with a stronger
postcondition if I kept track of the pointers as sequences instead of
sets, like in the case of Elems, and check that there’s no
earlier node that the sequences have in common before the result.
Anyway, this post is long enough as it is.
I’ve been meaning to play with theorem proving for a while, and this was a nice excuse to do so. It was a mostly enjoyable exercise, and Dafny seems like a nice language to start exploring the field.