An Extensible Iteration Facility

3 min read Original article ↗

Consider a stream that has been partially run length encoded (RLE), such that

  • a sequence of characters ('~', '\0') becomes a single tilde
  • a sequence of characters ('~', N, D) becomes a run of D characters of length ord(N)
  • other characters are passed through without modification

So, "abc~\0def" expands to "abc~def", while "abc~\x04/def" expands to "abc////def".

This problem is perfectly doable with the tools we already have on hand:

(defun rle-decode (src)
  (let ((src (iter src)))
    (infinicat
     (lambda ()
       (block next
	 (let ((c (funcall src)))
	   (unless (eq c #\~)
	     (return-from next (list c)))
	   (let ((n (char-code (funcall src))))
	     (when (zerop n)
	       (return-from next (list #\~)))
	     (let ((d (funcall src)))
	       (mapiter (lambda (i)
			  (declare (ignore i))
			  d)
			(range n)))))))))))
CL-USER> (let ((s (format nil "abc~~~a/def" (code-char 40))))
	   (for (c (rle-decode s)) (format t "~a" c))
	   (format t "~%"))
abc////////////////////////////////////////def
NIL

However, we could implement this in Python like so:

def rle_decode(src):
    src = iter(src)
    for c in src:
	if c != '~':
	    yield c
	elif not (n := ord(next(src))):
	    yield '~'
	else:
	    d = next(src)
	    for i in range(n + 1):
		yield d
>>> ''.join(rle_decode('acd~\x28/def'))
'acd/////////////////////////////////////////def'

Invoking rle_decode produces an iterator backed by a function that generates values. Every time we call next on that iterator, execution of the generating function picks up at the beginning (on the very first call) or just after the most recent yield (for every subsequent call), and it continues until either the next encountered yield (in which case it produces a value for the caller) or the end of the function (implicitly raising StopIteration). The generator may yield values from any number of places in its body, yielding us considerable flexibility in how we write it.

In the case of rle_decode, that flexibility allows us to express the process of RLE decoding much more clearly. Meanwhile, the Lisp version operates by creating a series of sub-sequences, each corresponding to either single characters or expanded runs, and then concatenating them. Where the Python version permitted a very direct style, the Lisp version uses an invented intermediate form. While that intermediate view might be interesting for some application, here it is just a means to the end of expanding a stream of compressed data. It was necessary, though, because the function generating values for the iterator always starts at the beginning on each call; we cannot directly use progress through that function to mark generator state.

Creating the flexibility of expression that generators provide will be a task for another day. Until then, here's a silly version of Fizz Buzz.

(defun fizzbuzz (&optional n)
  (mapiter (lambda (&rest xs)
	     (some #'identity xs))
	   (circular '(nil nil nil nil nil
		       nil nil nil nil nil
		       nil nil nil nil fizzbuzz))
	   (circular '(nil nil fizz))
	   (circular '(nil nil nil nil buzz))
	   (range 1 n)))
CL-USER> (for (n (fizzbuzz 20)) (print n))

1 
2 
FIZZ 
4 
BUZZ 
FIZZ 
7 
8 
FIZZ 
BUZZ 
11 
FIZZ 
13 
14 
FIZZBUZZ 
16 
17 
FIZZ 
19 
NIL