Settings

Theme

The Lone Lisp Heap

matheusmoreira.com

94 points by stevekemp 17 days ago · 32 comments

Reader

PaulHoule 17 days ago

You can learn a lot developing a language and runtime but you will reach a point when you'll realize you can go back and do it all better.

  • matheusmoreira 17 days ago

    No doubt. The second attempt is always better. My initial plan was to write a complete first implementation in C so that it's always possible to bootstrap the language, then write a compiler inside the lisp itself, or write a Rust version. Hope I somehow manage to do it all before I die of old age.

    • exe34 17 days ago

      Aha I've been semi-vibe-coding a scheme for esp32c3 and linux at the same time, and forgoing the libc too, so baremetal c. I went with a slightly awkward approach for allocation, with heap being seen as pages, and within pages, fixed size objects of size 8, 16 or 32 bytes. A pair is two words, and either bitpacked or pointer to another object. I'm not far in, vaguely following the Peter Michaux approach.

    • zabzonk 17 days ago

      > The second attempt is always better.

      Um, see The Second System Effect.

      https://en.wikipedia.org/wiki/Second-system_effect

      • matheusmoreira 17 days ago

        Lone itself is a second system: it's the spiritual successor of liblinux. I suppose the scope did increase... I'll try to be careful.

      • mpyne 17 days ago

        In fairness, with the waterfall methodology that pervaded back then, the "first" system you shipped was actually the second. "Build one to throwaway; you will, anyhow".

        • db48x 17 days ago

          Actual waterfall development was far more iterative than most people realize. If you want a primary source, I recommend Sunbust and Luminary by Don Eyles. It recounts the development of the software for the Apollo lunar lander.

          • mpyne 15 days ago

            Then it wasn't "actual waterfall development". The paper that defined waterfall literally tells you that you will build the system twice. Ideally only twice. You can refer to Dr. Royce's paper as the primary source on that.

            It is heartening to know that iterative development has been commonplace since long before the agile manifesto was written though, to make it clear that it has long been used and long been successful.

            • db48x 9 days ago

              Royce’s paper has a critical flaw. It assumes that a released system can be put into operation once and then run forever. There are cases where this is true, but real life offers more variation than that. In the case of the Luminary, the initial release was for Apollo 10. That was the mission that did the dress rehearsal for landing, sending an unmanned Lunar Module most of the way through the landing sequence before aborting and returning to orbit. But there were seven additional missions after that, Apollo 11 through Apollo 17, and they took the opportunity to update Luminary for each of them based on feedback from the prior missions.

              Don Eyles even produced an updated version _after_ the Apollo 17 version was released, but before Apollo 17 flew, just to test a theory about how to make the software more responsive during the final landing when a human was actually selecting a landing site. Of course that version required changing the requirements, which in true Waterfall style were part of the contract his company had signed with NASA years before!

              And even inside of a single waterfall cycle there is feedback from both automated and manual testing. Luminary was run in a simulator running on a supercomputer that simulated various landing conditions. It had to pass those simulations before it ever had a hope of being used in space. Eyles calls that “unfortunate”, but the reality is that it is simply necessary.

              And then there was the manual testing, which NASA did on grand scale for the Apollo missions. They made an accurate 3D model of the terrain near the landing sites on a table. This table hung upside down in a hangar in Houston. A television camera was mounted on a six–axis mount below the terrain model. The mount was controlled by signals from a Lunar Module simulator used by the astronauts, and the image from the camera was fed to the screens outside the windows of the LM simulator. As the astronauts flew the simulator the camera moved along the same path, providing the astronauts with accurate real–time visuals. The LM simulator had a real AGC and ran the most up–to–date version of Luminary as the astronauts practiced their landings. Several improvements to Luminary resulted from feedback provided by each crew as they practiced.

              Fundamentally Waterfall discourages iteration because it sees it as a waste of time and money. Ultimately this is because back then releasing software was expensive. Luminary had to be sewn, by hand, into a core memory rope before it could be installed in the spacecraft. Iteration could only happen on that scale because NASA had already planned multiple moon landings. Iteration in the simulators was much easier and cheaper though because they could load the software from disk drives. Those drives were the size of a washing machine though, and could never be flown in space.

              The assumption that it is cheaper to fix bugs in the specification before you ever write any software was based on numbers gathered from software development done for the US Air Force. It turns out that if you detect a bug in your missile’s flight software _after_ you have manufactured hundreds of thousands of missiles for the Air Force then yes, fixing that defect is very expensive. You really would rather have found the problem earlier in the development process. But today we can release a new version of our software to hundreds of millions of users around the world with the press of a button. The assumption simply no longer holds so iteration should not be seen as “unfortunate” any more.

              • mpyne 8 days ago

                Yes, and this is much of my lament. The DoD mindset pervaded a lot of places building significant software systems in early computing, as compared to NASA and others.

                I don't think iteration should have been considered unfortunate even back then, but it was. And by the time the DoD started to realize iteration was not only required, but helpful (no later than the mid-1980s with a Defense Science Board report on software acquisition methods initiated by Dr. Fred Brooks), the waterfall methods had more or less infested all of DoD software acquisition.

                Even worse (for DoD), this was occurring even as DoD's organic ability to develop software-based systems was atrophying, so they needed their contracting/acquisition process to lead to successful software delivery more than ever before. Had DoD had a solid internal iterative software development process they could fall back on they'd have been in much less of a pickle.

  • cultofmetatron 17 days ago

    this is my favorite thing about agentic coding. its super easy to build a v1 and get the system to a point where it does what you want which means you learn what you need for the v2 much faster

  • mhb 17 days ago

    Also building furniture.

NetMageSCW 17 days ago

Flashbacks to when I implemented a new storage system for a Lisp/Prolog system while porting the core code from Pascal to C. We stuck with linked list of pages so we could keep pointers and gain some speed over array indexing for every object access.

  • matheusmoreira 17 days ago

    That's cool! Can you post more details? In my benchmarks the contiguous heap dramatically improved cache utilization. How did you obtain higher performance with linked lists?

throwaway58670 16 days ago

Such a delight to read a technical article, and about something outside the hype cycle.

I also liked the color scheme.

Someone 16 days ago

FTA: “Lone is a lisp interpreter written in freestanding C. There is no dynamic memory allocation in freestanding C. There is no such thing as malloc. There is no libc. There is only me and the code. If I wanted malloc, I would have to write it myself.

[…]

mmap is awesome but Linux's got something even more awesome: mremap, which makes it almost trivial to manage the page-based heap.“*

⇒ they are using “freestanding” in a non-standard way. Usually, it means running on bare metal without an OS (https://en.wikipedia.org/wiki/Standalone_program: “A standalone program, also known as a freestanding program, is a computer program that does not load any external module, library function or program and that is designed to boot with the bootstrap procedure of the target processor – it runs on bare metal”

  • matheusmoreira 16 days ago

    Freestanding as in freestanding C. Lone passes the -ffreestanding -nostdlib flags. No external modules, libraries, functions or programs are loaded by the interpreter.

    It's true that lone doesn't run on bare metal. I chose Linux for pragmatic and philosophical reasons. I'm not sure lone would ever have printed hello world if I hadn't chosen to build on top of Linux, which is also the only kernel with a stable binary interface. No other system lets you avoid the libc.

    • Someone 15 days ago

      Of course you’re free to make your own choices, but I find it weird to accept relying on many lines of Linux kernel code and then not wanting to reuse a relatively small number of lines of an existing malloc.

      The reverse (running on bare metal and tweaking an existing malloc to run on it) looks way more logical to me.

      > No other system lets you avoid the libc.

      On most OSes [1] it’s relatively easy to write a libOS that just wraps the system calls. Your only dependency would be on the mapping to syscall numbers.

      [1] OpenBSD is/may be (I don’t know the status of these) an exception. See https://man.openbsd.org/pinsyscalls.2, https://lwn.net/Articles/949078/

  • shakna 16 days ago

    They're using "freestanding" as in "-ffreestanding" [0].

    "C built to compile with freestanding mode", rather than "C built to run bare metal".

    [0] https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#in...

adelmotsjr 16 days ago

As a Brazilian, i'll let my nationalism escape a little bit to say that it is awesome to see such an incredible work done by a fellow brasileiro. This gives me the same feeling of awesomeness that I get when I see José Valim's work on Elixir.

I only wish I could be as diligent and focused to do a thing like this, because I think it would be so much fun. I bought the Crafting Interpreters book a while ago, but still haven't got time to actually go through with it.

Well, maybe one day...

  • matheusmoreira 16 days ago

    Thanks, that's high praise! I highly recommend Crafting Interpreters. Bob Nystrom is among my favorite authors. You should read his blog too.

  • quibono 16 days ago

    If I'm not mistaken there's also the author of Starlette / FastAPI

dullcrisp 17 days ago

I don’t know too much about this, but strictly based on the name I assumed that you should use a heap. Is that not right?

  • matheusmoreira 17 days ago

    Do you mean heap, the data structure?

    https://en.wikipedia.org/wiki/Heap_(data_structure)

    That is indeed nowhere to be found in the article. Heap is also used to refer to dynamically allocated memory, which used to be implemented via a heap segment.

    https://en.wikipedia.org/wiki/C_dynamic_memory_allocation

    https://en.wikipedia.org/wiki/Data_segment#Heap

    • dullcrisp 17 days ago

      Yeah upon looking it up it appears to have nothing to do with the data structure.

      When they were doing a linear scan to find a large enough free block I was waiting for a heap to come into the picture, but apparently it’s not done that way.

      • matheusmoreira 17 days ago

        I think the term "heap" is used in the sense of "pile" which evokes the image of a somewhat disorganized heap of things you can grab in any order. Contrasts with the disciplined LIFO stack.

        Maybe actual heaps were used to implement this at some point. To be perfectly honest, I'm not sure.

        • adrian_b 16 days ago

          The first use in computing of the word "heap", which refers to a data structure used for priority queues or for sorting, i.e. a kind of binary tree, was in an article published in June 1964, about the algorithm "Heapsort" (by John William Joseph Williams).

          The second use in computing of the word "heap", which has no relationship whatsoever with the other meaning, occurred first in the report about the programming language ALGOL 68, which was published in December 1968 by Adriaan van Wijngaarden et al.

          In ALGOL 68, "heap" is used as a keyword, to indicate that a new variable must be allocated in the heap, instead of being allocated in the stack, which is the default.

          So "heap" in the second sense, which is used by you and traditionally in UNIX, was coined to oppose "stack", as not being constrained by a LIFO allocate/free policy.

          The word stack had been popularized by another Dutch, Edsger W. Dijkstra, in May 1960, who explained how to use a stack and a stack pointer for implementing the blocks of ALGOL 60. So both "stack" and "heap", in the senses related to memory management, come from Dutch computer scientists, and they have been used first in connection with the languages ALGOL 60 and, respectively, ALGOL 68.

          Before ALGOL 68, the language IBM PL/I had used since December 1964 the terms "automatic storage" instead of "stack" (the source of the C keyword "auto") and "controlled storage" instead of "heap".

          John McCarthy published the description of the garbage collector used by LISP in April 1960, but he did not use any special word for the heap, because in LISP that was the only kind of memory used for data, so unlike in PL/I or ALGOL 68 there was no need to distinguish it from memory areas that were managed in a different way.

        • dullcrisp 17 days ago

          That’s the answer I found too. With pictures even https://stackoverflow.com/a/662285

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection