Constructing a list in a Monad – Blog – Joachim Breitner's Homepage

In Haskell, lists are ubiquitous. In real-world Haskell, monads are ubiquitous. So it should be an easy task to fill a list with values obtained in a monad, i.e. a list of n random numbers (in the Rand monad), or getting some input from the user, and storing it in a list, until the user chooses to finish the input.

The task

For now, assume a function getInput :: IO Int; we want to run it 10000 and then have the list of the values returned. Sounds simple and straight forward, but it is actually really hard to get the run-time behaviour that we naively and optimistically expect.

The expectation

In an imperative setting, we would have a simple counting loop. In the loop, we run getInput, allocate a new cell for the linked list, and append it to the list. Clearly, this

consumes no stack space and
at the end of the loop, the list is already fully evaluated in memory, ready to be used.

Attempt 1: The accumulator

I’ll start with the probably worst attempt. Unfortunately, from what I have seen from correcting Haskell exams, this is what beginners would write, especially if they have the above imperative algorithm in mind, and know about the “convert loops to recursion with accumulators” approach to solving Haskell problems:

accumAppend :: Int -> IO [Int]
accumAppend n = go n []
  where
    go 0 l = return l
    go n l = do {x <- getInput; go (n-1) (l++[x])}

To check the space usage of this, I pass -Kn to the GHC runtime and see what limit is just enough to let this program calculate accumAppend 10000 and fully evaluate the resulting list. It turns out that we need 328336 bytes of stack space.

So what about the second requirement; that the list is fully evaluated? Experienced Haskellers see it immediatelly: The accumulator is not used in the loop, so each iteration adds a thunk around the previous value, that would add the next entry to the end. Only when the list is actually used, these are run, each traversing the whole list, causing quadratic cost. We can verify this using ghc-vis:

Clearly (and not surprisingly) this is not the best solution.

Attempt 2: The accumulator (forced)

Lets try to fix the second issue, by making sure that the list is always fully evaluated. We import Control.DeepSeq and write

accumAppendSeq :: Int -> IO [Int]
accumAppendSeq n = go n []
  where
    go 0 l = return l
    go n l = do {x <- getInput ; go (n-1) $!! (l++[x])}

As the accumulator gets fully evaluated before being passed to the next invocation of go, the result will not contain thunks. But note that this has also fixed the stack consumption: The code now runs the smallest setting for -K, 8 bytes. This works as both go and deepseq are tail-recursive.

But this is still not the solution we are looking for, as the quadradic cost caused by using ++ is still there.

Attempt 3: The accumulator (reversed)

Since we know that l++[x] is expensive, but x:l is cheap, we can simply use this to update the accumulator. This has the slightly annoying consequence that the entries of the list are in the reverse order, but we can fix that in the base case:

accumReverse :: Int -> IO [Int]
accumReverse n = go n []
  where
    go 0 l = return (reverse l)
    go n l = do {x <- getInput ; go (n-1) (x:l)}

And indeed, we get noticably faster code that still runs in 8 bytes of stack. Unfortunately, we still don’t construct the final list directly.

Attempt 4: Naive recursion

Maybe we can do without an accumulator. The most naive such code would be

listRecurse :: Int -> IO [Int]
listRecurse n = go n
  where
    go 0 = return []
    go n = do {x <- getInput; r <- go (n-1); return (x:r)}

and I’d expect that most experienced Haskellers would write that code (if they are asked not to use a library function). And indeed, this code does create the final list directly. But it requires a large stack or 164616 bytes. The reason is that this is no longer tail recursive: After calling go again we need to take the result and prepend x

Attempt 5: library functions

Maybe we should simply not worry about the implementation, and use library functions? Clearly, the authors of such libraries have found the best solution. So how does

listReplicateM :: Int -> IO [Int]
listReplicateM n = Control.Monad.replicateM n getInput

fare? Unfortunately, not better than the naive recursion. In fact, the stack space requirement is the same 164616 bytes. The same holds when using special libraries for generating and consuming list, e.g. the Streams module from the vector package:

listStreams :: Int -> IO [Int]
listStreams n = MStream.toList $ MStream.replicateM n getInput

Attempt 6: Difference lists

It is time to dig deeper into our box of tricks. What if we have lists with constant time Snoc (a.k.a. appending a new element)? Then Attempt 3 would not require reversing the list. Such lists are provided by difference lists; there are libraries for that, but we can simply implement them using lambdas:

accumDList :: Int -> IO [Int]
accumDList n = go n id
  where
    go 0 l = return (l [])
    go n l = do {x <- getInput; go (n-1) (l . (x:))}

Indeed, no stack is required (8 bytes are enough). But we are cheating here: We are still not constructing the final list, but rather we combine functions that append elements to lists. So when accumDList returns, what we have in memory, is a thunk and sequence of partial function applications in the wrong order, and evaluating the thunk does basically the same thing as reverse in Attempt 3. It might be a bit faster, is still not satisfying enough:

Attempt 7: Unsafe functions

Even deepter in the box of tricks, somewhere in the grit and dust where one usually should not reach, there is a function that can help us out here: unsafeInterleaveIO. This allow us to modify the naive recursion in a way that first creates the Cons-cell, and then continues the loop:

listUnsafeInterleave :: Int -> IO [Int]
listUnsafeInterleave n = do {l <- go n; return $!! l}
  where
    go 0 = return []
    go n = do x <- getInput
              r <- unsafeInterleaveIO (go (n-1))
              return (x:r)

The stack consumption is 8 bytes. It is worth noting that the all to deepseq (in $!!) will actually drive the evaluation of go. Also, it guarantees that the unsafe effects of unsafeInterleaveIO are confined to the execution of listUnsafeInterleave, i.e. are actually safe.

Run time statistics

I ran criterion over the functions (generating lists of length 1000), and found these results:

accumAppend: 6.5 ms
accumAppendSeq: 8.2 ms
accumReverse: 14.0 us
listRecurse: 8.0 us (same as listReplicateM and listStreams)
accumDList: 12.0 us
listUnsafeInterleave: 174.0 us

Conclusion

There are many ways to write a list-constructing list in Haskell, none are perfect – they either fill the heap, or do not construct the list directly (usually requiring a second pass through it), or rely on IO-specific unsafe functions. The naive recursion performs the best, but may cause a stack overflow (note, though that from GHC 7.8, the stack will be unlimited). The next best option seems to be the difference lists

I’m still wondering: What would be required from Haskell, GHC or the monads in question to have a fully satisfactory solution here, i.e. one that is as fast as the naive recursion, as stack efficient as the difference list solution, and allocates no thunks, only list cells?