Implementing DOES> in Forth, the entire reason I started this mess
boston.conman.org112 points by todsacerdoti 21 hours ago
112 points by todsacerdoti 21 hours ago
CREATE makes a dictionary entry for a word named by the string you supply in the input stream, whose execution semantics are to push onto the stack the address of the dictionary space following the entry that was just created. That address lives in the variable HERE. Execution semantics for a word means some code invoked when you execute the word. That code in turn is pointed to by an address living in a cell that is part of the dictionary entry.
DOES> overwrites that address so that executing the word, instead of doing the default thing, now runs some different code, namely the code that you supply after the DOES>.
This is something of a kludge because the usual implementation stores something (the default semantics) in that cell when you run CREATE, then later overwrites it when you run DOES. Since lots of Forth targets today are microcontrollers whose code storage is in flash memory, overwriting individual already-written cells in code space is not nice.
Early Forths had <BUILDS ... DOES> instead of CREATE ... DOES> . You can see how the angle brackets originally looked symmetrical but after things changed, the bracket only appeared on DOES> and that may be part of why people find it confusing.
<BUILDS didn't install any default action into the newly created word. It left it uninitialized so it would get filled in when DOES> came along. CREATE DOES> was sort of an optimization since CREATE already existed too, making <BUILDS unnecessary. So they got rid of <BUILDS during standardization back in the minicomputer era where this stuff always generated code in ram (or maybe magnetic core) rather than flash. That optimization in turn has bitten some implementers in the butt. So <BUILDS DOES> has come back into usage among some MCU implementations like FlashForth. FlashForth is pretty nice by the way.
Well I didn't mean to type that much, but I hope it helps.
It would seem like less of a kludge if DOES> precipitated into a deferred context terminated by its own semicolon. So you could just type
CREATE FOO 42 , DOES> @ ;
into an interpreter to create the constant. Then if placed inside a definition, there would be two semicolons: : CONSTANT CREATE , DOES> @ ; ;
It's an extra nesting which makes it clear you have a definition that makes a definition. You could even put words between the semicolons which just become part of the definition of CONSTANT.It feels as if this DOES> thing is a kludge that activates within definitions, and kind of "hijacks" the rest of their instructions. Without DOES>, the material after it would be part of the definition of CONSTANT and not part of the definition of the word produced by CREATE. The switcheroo feels hacky.
DOES> is separate from the definition. The compiler just remembers what the most recently defined word was, and DOES> modifies a cell in that word's dictionary entry.
Similar (but NOT identical) concept in RetroForth I really enjoyed learning about years ago: https://rickcarlino.com/2021/til-how-retroforth-implements-d...
It’s nice to see Forth internal deep dives hitting the front page, great article.
Amazing article!
I hate DOES> I was implementing it well after 1am last night and I hate it, I have this feeling as something gets harder to implement it means its not right, but I know DOES> is right, so its me, I just couldn't implement it well. It was super frustrating. But now I feel better :)
I am new to Forth but it feels like `create does>` has to be replaced with some new construct, I just want word code to operate on its data, but I need to gain more experience to find out, for now `create does>` will do.
You can replace it, there are two (nearly 3) forths that replace it:
https://github.com/dan4thewin/FreeForth2/blob/master/ff.asm
which uses a double loop to lookup first macro words, and then immediate words
https://github.com/ablevm/able-forth/blob/current/forth.scr
Ableforth implements a defer/expand operation \ to effectively quote words. The basic loop is then simply parse a text word, look it up and execute it.
Both make use of macros (code generators) to implement deferred behaviour, as well as code inlining. Ultimately all these operations implement defer by manipulating the execution flow, something that algebraic effects also do.
I have a feeling that algebraic effects can be used in a Forth to implement DOES.
Also Chuck Moore's ColorForth doesn't have "create does>" at all. Probably he considered the complexity was not worth it. I don't have it in my interpreter either, and I don't miss it; for the few near-misses I just use regular literals and calls. For instance, for the textbook example of "self-indexing arrays":
: index-array swap cells + ;
create a1 10 cells allot
create a2 20 cells allot
: array1 a1 index-array ;
: array2 a2 index-array ;
I used to use the "implementation-dependent" trick of popping the return address (in e.g. index-array) to get the data. Less verbose, a bit more efficient. But my implementation doesn't permit it anymore.Recently I've found out that implementing "self/this" pseudo-value and pseudo-method calls much more useful. The relation with this and "create does>" is that latter can be seen as poor man's closure, or poor man's object [1].
[1] https://stackoverflow.com/questions/2497801/closures-are-poo...
In Dusk OS, I implement does> but with one level of indirection removed. Rather than using create, you bind any number to a behavior. Example:
: foo 1+ does> . ;
42 foo bar
bar \ prints 43
If you want, you can add the "traditional" indirection in the initialization part, for a similar effect.
So, not quite the same, but almost, and I think it echoes the intuition you have, which is also mine.
Long, long ago I wrote a Forth for OS/2 in assembler (mostly out of spite, because I was told you couldn't write OS/2 programs in assembler, you had to use C++)
I still don't know what DOES> really does... ;-)
It’s the delimiter between the code that will run when the word is run and the code that gets compiled to be run later.
Typical usage is for the “code that will run immediately” is to store some data, and for the “code that gets compiled to be run later” to use that data.
perhaps the simplest example is CONSTANT, which can be defined like this:
: CONSTANT ( w "name" -- )
CREATE ,
DOES> ( -- w )
@ ;
Here, the “code that will run immediately” is CREATE ,
which a) reads a name from the command line and creates a word with that name, and then takes the top of the stack and stores that directly after the word’s definition.The “code that gets compiled to be run later” is
@
which fetches the formerly stored value (taking the address of the formerly created word from the stack)DOES> has to do some shenanigans to make that work, but that’s an implementation detail, and will be dependent on the particular FORTH being used.
OK, so if we have
: NAME alpha beta ... psi omega ;
what happens is that at at compile time, a dictionary entry NAME is created, and then the alpha ... omega words are compiled to be run later.When DOES> is introduced:
: NAME alpha beta ... DOES> ... psi omega ;
all of the above still holds. We still have a dictionary entry NAME, which denotes all of the words up to the semicolon, including DOES>.Then, when we execute NAME in a compilation context, because the word sequence contains DOES>, everything to the left of DOES> is specially treated: it is executed immediately in the compilation context and is removed. But that's not all; DOES> doesn't just execute everything to the left and disappear; it leaves something behind: some word which is then combined with the material to the right of DOES> to form the run-time sequence.
In your example, when we run CONSTANT, the part to the left of DOES> fetches a name from the input stream, and creates a word, and then makes the value on the stack the definition.
the accumulation of to-be-run later words is interrupted, and everything before DOES> is done now, at definition time, and removed from the definition.
The CREATE material, when executed, leaves behind a reference to the word denoting the constant. Then DOES> creates a definition for that word, using the remaining material.
Is that more or less it?
OS/2: seems an odd thing to say. I used to write in C and MASM, and there were no particular barriers.
Jonesforth doesn't implement it because it's complicated and not really necessary to understand the basics. Also it's, erm, an exercise for the reader, and this reader solved it very well ;-)
I tried do do it in a 16-bit JonesForth-based implemenetation and it required to rename JonesForth CREATE to CREATEHEAD, implement a primitive DODOES and then define these two words:
: CREATE WORD CREATEHEAD DODOES , 0 , ;
: DOES> IMMEDIATE ['] LIT , HERE @ 6 CELLS + , ['] LATEST , ['] @ , ['] >DFA , ['] ! , ['] EXIT
i've never looked closely at any of this, and it's been a long time since i looked at all.
reading colorforth code and especially commentary (https://github.com/Howerd/colorForth) it seemed that it refines the concept of staging into colours (does> might correspond to cyan?).
hopefully someone more knowledgeable will chime in here!
Okay, so what's the significance of it and what's the boon?
Surprised so little public forth's implement it.
Decades ago, a closure in Forth was especially innovative with DOES>:
: COUNTER
CREATE ,
DOES> DUP 1 SWAP +! @ ;
0 COUNTER PK
PK . \ => 1
PK . \ => 2
A semi-equivalent in Javascript is: const counter = init => {
let x = init;
return () => { x += 1; return x; };
};
const pk = counter(0);
console.log(pk()); // => 1
console.log(pk()); // => 2
All the most common Forth I know implement CREATE DOES>
What’s funny, is that I used to know how it works, now any time I come across these kind of articles I get more and more confused and further away from understanding. It’s like reading those convoluted explanations of what a monad is.
It implements defining words ie. Additional compiler words.
It does this by doing something now, and later (you could read create does> as now later>)
So for
: CONSTANT CREATE , >DOES @ ;
This makes the defining word CONSTANT, which when run (now) compiles the next word
So CONSTANT myvar will compile myvar. Myvar, when run (later) will get it's value and push it to the datastack.
It only briefly goes into what it does, this article goes into how it's done for a particular implementation.
[flagged]