Igneous Linearizer: semi-structured source code

75 points by seagreen 2 years ago · 41 comments

Reader

This is an interesting direction.

One thought is that obsidian can execute web assembly and a parser / sema checker written in something that turns into wasm can therefore be run on the source files. Can probably tie that to a syntax highlighter style thing for in-ide feedback.

The other is that markdown is a tempting format for literate programming. I do have some notes in obsidian that are fed to cmark to product html. With some conventions, splitting a literate program into executable code embedded in a html document is probably doable as an XML pipeline.

In a much simpler vein, I'm experimenting with machine configuration from within obsidian. The local DNS server sets itself up using a markdown file so editing an IP or adding a new machine can be done by changing that markdown.

I hope the author continues down this path and writes more about the experience.

seagreenOP 2 years ago

> I hope the author continues down this path and writes more about the experience.
I appreciate it=) I definitely want to write some more stuff up, in particular how code organization changes when you can tag and add attributes to definitions.
nbbaier 2 years ago

> In a much simpler vein, I'm experimenting with machine configuration from within obsidian. The local DNS server sets itself up using a markdown file so editing an IP or adding a new machine can be done by changing that markdown.
This sounds really interesting, any code to look at anywhere or write up about it?
- JonChesterfield 2 years ago
  Simple enough to write inline.
  There's a file called DNS.md which contains lines like `192.168.1.15 milan` in an obsidian vault. Obsidian sync copies it around. DNS is by pihole which uses a plain text file in that sort of format for the entries.
  Then superuser's crontab -l
  0 * * * * cmp /home/jon/Documents/Obsidian/SystemControl/DNS.md /etc/pihole/custom.list >/dev/null 2>&1 || cat /home/jon/Documents/Obsidian/SystemControl/DNS.md > /etc/pihole/custom.list && sudo -u jon pihole restartdns
  Cron has rules about relative paths that I don't remember so it's literally written as above.
  It seems likely that the idea generalises. I'm considering managing public keys for ssh / wireguard in similar fashion but haven't done so yet.
nohat 2 years ago

Yeah, I definitely see using this for literate programming. Not quite sure the best way to organize it. Maybe use a static site compiler to auto host documentation version.
- seagreenOP 2 years ago
  
  I wish you could just use Obsidian Publish to host sites, but due to the indentation issue you have to control the rendering, which is a bummer.
  Obsidian Digital Garden[1] is FOSS, so it might be modifiable parse and output the code pages correctly.
  [1] https://github.com/oleeskild/obsidian-digital-garden
  - WorldMaker 2 years ago
    
    The typical Markdown answer to needing indentation preserved is the "code fence" (triple backquotes ```), though I imagine the problem with that is that Obsidian by default stops dealing with Wikilinks inside fenced code. I don't know Obsidian that well, but maybe there's a way to use a code fence and have it support Wikilinks inside?
    A different direction to explore might be to explore proportional font coding techniques that rely less on whitespace. Lisp can be a good language to play with those ideas given whitespace isn't syntactic. Though idiomatic Lisp has certainly relied on semantic whitespace in coding styles for a very long time.
    
    seagreenOP 2 years ago
    
    > I imagine the problem with that is that Obsidian by default stops dealing with Wikilinks inside fenced code
    Exactly. Interestingly enough autocomplete is still triggered by [[ inside of a code block which is kind of funny. So writing code blocks works fine, it's just that they won't display with links.
    > A different direction to explore might be to explore proportional font coding techniques that rely less on whitespace.
    I'm definitely open to proportional font coding techniques being interesting, but in this case with all leading indentation unusable I doubt they'd be enough to get a normal experience. Unless you only write assembly so you can stick to the left margin <taps forehead>.
- JonChesterfield 2 years ago
  
  markdown -> xml -> hack around -> html is essentially a static site generator. https://github.com/commonmark/cmark works really well.

arnsholt 2 years ago

This is neat, but it does seem like a lot of work to get part of the way to what a Smalltalk already gives you.

seagreenOP 2 years ago

I love Smalltalk, and have done a reasonable amount of messing around with Cuis (which is awesome and everyone should try it).
However this gives you two things that Smalltalk doesn't:
1. It's language agnostic (boring I know)
2. It promotes keeping your code and written texts in the same system where they're both first class. That way they can link between each other, transclude each other, be published together, be organized the same way, etc. I really think this is the most interesting thing about the project, it really feels important to me.
Caveat: right now my written documents can link to/transclude code, but it doesn't work the other way yet. This is because the linearizer will see a link from code to documents as another definition and try to jam it in the source file. This would be an interesting use case for typed links, but Obsidian doesn't a have them AFAIK. Kind of cool since I haven't seen many other use cases for typed links in the wild.
EDIT: It occurs to me that I've never used a Smalltalk notetaking or word processing program. Are there any that are integrated with the System Browser, so that they can link to (or even better embed) code? If anyone has more info please let me know!
- couchbed 2 years ago
  
  Lepiter is a Pharo-based notetaking app within the Glamorous Toolkit. I'm not sure it's mature enough to compete with Obsidian/etc., but it does allow linked and embedded code like you were thinking.
  https://lepiter.io/feenk/introducing-lepiter--knowledge-mana...
  - seagreenOP 2 years ago
    
    Of course! I should have just guessed they'd already have something like this.
    We either need to port ALL of Glamorous Toolkit to mainstream langs or we need to convince all our employers to switch to Smalltalk. I am not certain which of those is possible or easier.
    
    skeledrew 2 years ago
    
    I'd say the porting is better. I've gone a ways into using GToolkit, and it has some very nice things. I especially like the infinite nesting of editors, which is like transclusion on steroids, and the driller. And it's already language agnostic and very extendable.
    But when it comes to actually extending it, that's where the sharp edges start to cut. I was able to do a fairly quick, enhanced Python support extension, but one of the issues is getting proper class equivalence, and the back-and-forth passage of data is a major performance suck. Also things will grind to a halt if you push and try to process too much data at once on the Smalltalk side. Maybe having a fairly beefy machine will help with that though.
ralferoo 2 years ago

I remember reading an article on Source Code In Database back in the early 2000s, and it's been knocking around my brain ever since as something I ponder every couple of years. I just can't shake the feeling that there's the gem of a future paradigm where everyone wonders "why we didn't always do it that way?", but then every time I try to follow those thoughts through to a conclusion, it always feels like it'd just be re-implementing Smalltalk, and then the question is "why isn't Smalltalk more popular?"
That said, there's a lot to be said for revisiting old ideas. There was so much interesting research done in the 60s and 70s in all sorts of random directions, maybe because at that time there were no precedents or expectations for how things should be done. There are so many untapped resources here, it's crazy. Every now and then I re-watch "The Mother of All Demos" [1] from 1968 where Douglas Englebert demonstrates some of the research at Stanford or the Sketchpad Demo [2] from 1963 where Ivan Sutherland is presenting a GUI-based CAD system.
Fortunately, these ideas have now been picked up again, but to me it's interesting to note just how long a time lapsed between these ideas and becoming mainstream. Some of it is obviously the cost as the state-of-the-art research machines were massively more powerful than the home computers even 2 decades later, but I'm sure there were a lot of great ideas that have just been forgotten.
Part of the problem, I think, is that we have found solutions to some of the easy problems and optimised it to such a degree that it's then hard to ever go back and revisit the alternative approaches because you'd need to regress so far from the current levels of expectations.
[1] https://www.youtube.com/watch?v=yJDv-zdhzMY [2] https://www.youtube.com/watch?v=6orsmFndx_o
- igouy 2 years ago
  
  > Source Code In Database
  ?
  https://www.google.com/books/edition/Mastering_ENVY_Develope...
  https://gemtalksystems.com/products/gs64/
  https://en.wikipedia.org/wiki/MUMPS
  - specialist 2 years ago
    
    InterSystems Caché has some MUMPS like qualities.
    It is easily the worst developer experience conceivable. Easily tied for last place in the pantheon of turrible ideas realized thru turrible implementations.
    Sure, you can bork a SmallTalk env with some ill-advised changes to the runtime.
    Caché is so brittle, you can bork your env with a compiler error. And there's no feedback. And since the env is a blob, there's no version control.
    https://en.wikipedia.org/wiki/InterSystems_Caché
  - ralferoo 2 years ago
    
    > > Source Code In Database > ?
    Rather than think about specific closed environments (which may be an unavoidable consequence of SCID), I was thinking more generically about the issues. At the time, I was firmly in the Java large webapp space, so I was mostly thinking about how you could target a JVM.
    In terms of the actual article, there's a bunch of links on Wikipedia [1] but I specifically was referring to an older version of this [2] article (I think, I don't remember it being as garish colours) and I think I found it via c2 [3].
    None of these quite match up with what I thought I remembered which informed a lot of my thoughts back then, but I was mostly thinking about what the UI for such a system might look like because a nice friendly GUI isn't necessarily optimal for an experienced programmer who's probably happiest writing and seeing their code as a big chunks of text, and also how sometimes you want code that the linter would hate because you've deliberately formatted something to make it easier for humans to understand.
    I was then thinking about how you could abstract and generalise statements and sub-expressions into small mini-functions that weren't complete functions per se, but more like templates. I spent a long time thinking how one might do code de-duplication by copying a graph of code, and then changing some of the nodes in a copy-on-write style thing, but decided there was no easy way of programatically deciding which part of the tree were being fixed due to a bug and needed to be shared with all copies, and which were just modified inputs or local changes. In terms of code, it's not that hard to do, but presenting in an intuitive way in a UI is much harder, especially if one of the goals is to make things easier for a novice programmer.
    [1] https://en.wikipedia.org/wiki/Source_Code_in_Database [2] https://www.mindprod.com/project/scid.html [3] https://wiki.c2.com/?SourceCodeInDatabase
    
    igouy 2 years ago
    
    > experienced programmer who's probably happiest writing and seeing their code as a big chunks of text
    Not if you're a Smalltalk programmer.
    "ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
    https://www.google.com/books/edition/Mastering_ENVY_Develope...
    
    ralferoo 2 years ago
    
    I kind of feel that we're talking at cross-purposes here. To me, whether the code is in a database, in-memory, serialised to a file isn't all that important as they're all just representations of the same AST. For me, the label "source code in database" is in comparison to "source code in linear files that need not have any inherent structure".
    Also, perhaps my use of the phrase "experienced programmer" is being interpreted negatively. I'm not trying to imply that someone who has a lot of experience using a graphical programming language is less experienced, I'm using it as a shorthand for "experienced programmer of a traditional text-based language". I'll continue to do so for this reply too, because adding that caveat every time I use that shorthand makes the actual meaning I was trying to convey much harder to see.
    As regards to the writing and seeing code as text comment, I meant that experienced programmers will probably find writing something like
    sin(angle) * radius + offset
    in a textual form the quickest way of expressing that idea, especially if their IDE supports auto-completion of variables. Most programmers generally also prefer to visualise their code the same way they wrote it, so presenting it back to them as text makes sense.
    A novice programmer might prefer to see that as a graph of operator nodes because it guides them through the process. Even better if they can organise the nodes in the layout they want, as some people remember visually and can use the distinctive look of each area of "code" to navigate when zoomed out.
    Certainly, in game development, I've seen fairly non-techy people create massive Blueprint graphs for Unreal this way, but they wouldn't consider themselves a programmer and were be scared by the prospect of a screenful of code that does the same thing. On the other hand, as someone who's used text to code for decades, I find Blueprints to be horrendously slow for me to understand when presented with someone else's "code" because there are far fewer "social code norms" being obeyed and people just do whatever makes sense to them.
    I actually think in the above example of a short expression, the best solution for a novice programmer probably isn't even a graph, but actually closer to the text for an experienced programmer. Most people will have had exposure to formulas at school, so they might well prefer to see something closer to the traditional text form of the source code, but with tools to guide them to entering that like the "Insert Equation" that's been in Word for decades - so for instance, you might insert a divide symbol and then fill in the two boxes, etc.
    The point is that both code as text and code as a graph can both be used to express the same AST at the end of the day, and people should be able to use whichever make most sense to them or makes them most productive. The tricky part then comes when people have chosen to format their code or layout their graph in a particular way that adds meaning and aids understanding of the graph / source code without actually affecting the AST. I'm specifically talking about formatting here rather than comments, which is a similar problem of how you you would build a code comment to a specific part of the tree in a graphical view, and likewise graphical views might have "visual sections" of the graph that don't necessarily map neatly to a linear sequence of source lines.
    
    igouy 2 years ago
    
    > I'm using it as a shorthand for "experienced programmer of a traditional text-based language".
    And I was telling you that experienced Smalltalk programmers don't work with "big chunks of text".
    Small snippets of text, presented in context.
    Sorry I'm not seeing enough clarity to wish to continue.
    
    igouy 2 years ago
    
    fwiw
    > big chunks of text
    "Lines of Code" / "Total Methods" = 7
    https://dl.acm.org/doi/pdf/10.1145/74878.74904
- ralferoo 2 years ago
  
  Decided to split the off-topic part off into a reply so that it didn't distract from the answer!
  In terms of over-optimisation forcing a certain technologies to be developed and others to be ignored, one example I'm very familiar with is computer graphics. I'd written a TON of stuff here, but decided to simplify it as it was labouring to specific a point.
  But our computer graphics state-of-the-art was roughly along these lines: drawing all edges of polygons, hidden-line removal (Sutherland), clipping intersecting polygons (Hodgman), filling polygons with a single colour, *, Gouraud shading, Gouraud shading with smaller triangles, Phong shading with bigger triangles, texturing, fixed texture and lighting pipelines, pixel shaders, vertex shaders. I'll also add compute shaders too, but that was more of a generalisation of what people were starting to do with pixel shaders operating on data that wasn't really pixel data.
  Now, you'll notice my * around the time of single colour filled polygons... this might not be the correct place to put the *, but around this point some people started experimenting with ray-tracing and got amazing results, just incredibly slowly. These were seen as the "gold standard", but because drawing triangles was much faster, this is where the money continued to be poured into, optimising and optimising this special case, discovering more techniques to "approximate" the right image, but trying to avoid the hard work of actually rendering it. Over time, things have got closer and closer to ray tracing, except transparent and shiny objects have always been the achilles heel.
  Fortunately industry's interest in ray-tracing has resumed, and now compute shaders are general enough that they can be used, but they're still orders of magnitude slower because the renderer needs to consider the entire scene not just a triangle at a time, so you need to store the scene in some kind of tree that's paged in on demand and different latencies for different pixels causes problems for the SIMD architectures. We're starting to see more and more consumer-level hardware with decent ray-tracing performance now, but it's been a decade of lost time in terms of optimisation from where it could have been if the entire market hadn't been competing only in making triangles rasterise more quickly.
  In the ray-tracing space, we still see that it's too slow to create perfect images (for very complicated scenes with lots of shiny surfaces and few lights, you might need thousands of rays per pixel to just get a handful that actually reach a light source), so we've invented all sorts of approaches to cover it up - whether it's training an ML model to guess the real colour for black pixels from neighbouring ones, or re-projecting pixels from a previous frame to fill in the games, etc.
  Personally, I can't help but think the real breakthrough in performant raytracing will come from tracing light from the light sources instead. This wasn't done traditionally because potentially it's even more expensive than tracing backwards from the pixel, but should be more accurate when there are multiple light sources.
  But even the latest batch of hardware is all focused on raytracing, which I think is missing the biggest trick of all - they could be using cone-tracing as a first approximation and then subdividing the cone into smaller and smaller chunks until they're approximately pixel sized. None of this is new, it's just not what the larger industry is doing right now, because it's cheaper and easier for them to do rays instead.
groby_b 2 years ago

Sure, but not all of us work in Smalltalk, and "but Smalltalk already does it" doesn't move legacy code bases either.
skulk 2 years ago

or emacs/org-mode
DannyBee 2 years ago

I mean, it doesn't even get you to where VisualAge was with Java, C++, and Smalltalk decades ago.

zacgarby 2 years ago

this is great! i’ve been thinking about exactly this (though styled after Logseq rather than Obsidian) but not gotten as far as implementing anything.

that being said, the thing i haven’t been able to convince myself of yet is why these are different to just normal (in-line) functions? as in, why should i have to write [[foo]]: would it not be better to have all identifiers automatically linked?

fwip 2 years ago

As a language-agnostic thing, I suppose you need to prevent the machinery from pulling in keywords and variable names accidentally (or the insides of strings or comments).
I like the idea (also a fan of Unison's approach to code-in-the-db), but I worry about the potential issues that come from effectively having a single global namespace. Could be that I just don't have the discipline for it, though.
- seagreenOP 2 years ago
  
  > As a language-agnostic thing, I suppose you need to prevent the machinery from pulling in keywords and variable names accidentally (or the insides of strings or comments).
  Exactly. But zacgarby's right that you would want some auto-linking, so this is where language-specific plugins come in.
  The difference from today's world would be that those plugins would leave their results explicitly serialized in the source medium, so they wouldn't have to keep being reconstructed by every other tool.
  > I like the idea (also a fan of Unison's approach to code-in-the-db), but I worry about the potential issues that come from effectively having a single global namespace. Could be that I just don't have the discipline for it, though.
  I have lots of thoughts on this. I was initially disappointed that Unison kept a unique hierarchy to organize their code-- that seems so filesystem-ey and 1990s.
  However, I'm now a convert. The result of combining a unique hierarchy with explicit links between nodes is a 'compound graph' (or a 'cluster graph', depending, getting the language from https://rtsys.informatik.uni-kiel.de/~biblio/downloads/these...). These are very respectable data structures! One thing they're good for is being able to always give a canonical title to a node, but varying what that title is depending on the situation.
  I think that for serious work the linearizer would want to copy this strategy as well. Right now it's flat because that's all I need for my website, but if you were doing big projects in it you'd want to follow Unison and have a hierarchy. In the `HashMap` folder you'd display `HashMap.get` with a link alias that shows plain `get`, but if that function is being called from some other folder it would appear as the full `HashMap.get`.
  You could still do all the other cool stuff like organize by tags and attributes using frontmatter, but for the particular purpose of display names having a global hierarchy is useful.
  EDIT: What matters more than what the linearizer does is what Obsidian displays, so it's there that the "take relative hierarchical position into account when showing links" logic would have to occur. That could be a plugin or maybe Obsidian's relative link feature, I haven't used the latter.
seagreenOP 2 years ago

> this is great! i’ve been thinking about exactly this (though styled after Logseq rather than Obsidian) but not gotten as far as implementing anything.
Thank you! I think [[links]] will work out of the box with Logseq since they're the same as Obsidian. Transclusions will be in the wrong format since Obsidian transclusions look like `![[this]]`, but it would be quick to modify the linearizer to handle them.
You may not want transclusions though since transcluding code into other code is... very weird. I'm curious what use cases people come up with for it though.

binary132 2 years ago

I really appreciate the unordinary direction you went with these articles and your site in general. An enjoyable read!

seagreenOP 2 years ago

All credit to Conor White-Sullivan for making links cool again.
> unordinary direction
Hahaha, this is a very polite way to put that=) But I do appreciate it.
- binary132 2 years ago
  
  Don’t take it as a backhanded compliment! We should all strive to surpass and excel the ordinary.

nbbaier 2 years ago

> The solution I've been waiting for is source-code-in-the-database. I'm cheering on multiple projects attempting this.

What are the projects you're especially bullish on?

seagreenOP 2 years ago

Hazel and Unison are two of the big ones. I'm friends with some of the Unison folks so I'm biased, but I really like how few features there are in the language. In general I'm just a huge sucker for subtractive improvement: if you can have a small number of awesome things (eg abilities) instead of a bunch of special case things (exception handling, monad trickery, dependency injection machinery) sign me up.
I know less about Hazel, my understanding is that it's source-code-in-CRDTs, which is definitely structured source code though may not technically be in a database.
- thyrsus 2 years ago
  
  Is this the unison you mean? https://www.unison-lang.org/
  Unrelated: what has your experience been using igneous-linearizer to help understand other people's code?
  - seagreenOP 2 years ago
    
    That's it. And the linearizer is only one way-- you write text with [[links]] and turn that into plaintext.
    
    thyrsus 2 years ago
    
    You may have encountered leo [ https://leo-editor.github.io/leo-editor/ ]. I use it to pull in or write code and break it up into comprehensible pieces. It works, but it feels overcomplex. I'm open to something simpler, and will give this a try.
    
    seagreenOP 2 years ago
    
    I had't seen it! The "clones" feature is cool, it sounds like they're way ahead in figuring out how to use transclusions effectively when coding.
    Previously I'd thought of using transclusions for things like long-lived documentation. Now reading about Leo it seems like they'd be just as useful for creating short-term views into one's code. Eg start a refactor PR by transcluding all the relevant definitions into a doc. Now you can start writing PR comments in that doc before you even begin coding, and when you do all the relevant code is right there.
DannyBee 2 years ago

VisualAge C++, Java, and Smalltalk.

Settings

Igneous Linearizer: semi-structured source code

Keyboard Shortcuts