JSSE: A JavaScript Engine Built by an Agent

p.ocmatos.com

27 points by tilt 3 months ago · 18 comments

Reader

They really did manage to benchmaxx test262 and beat everyone at it. In my testing (all engines with experimental flags in same conditions on full test262):

  99.5 jsse
  99.1 v8
  99.0 spidermonkey
  98.1 libjs
  97.4 escargot
  97.3 jint
  96.4 boa
  95.0 graaljs
  93.2 kiesel
  92.1 jsc
  82.8 quickjs
  82.5 quickjs-ng
  82.1 xs
  80.1 brimstone
  77.7 nova
  74.6 jerryscript
  66.5 sobek
  65.5 goja

It ain't fast (~10x slower than boa), but very compliant.

Imustaskforhelp 3 months ago

Great. the one-agent-one-human repository by embedding-shapes is certainly quite nice and I had tried to re-create the results (within golang) though and I had failed but even within that I feel like maybe I learnt a lot of things (Also got to talk with emsh on bsky!)

It will also be very interesting to read simonw's comments on all of this too (https://news.ycombinator.com/item?id=46779522#46786824) when he had said:-

No, I'm still waiting to see concrete evidence that the "swarms of parallel agents" thing is worthwhile. I use sub-agents in Claude Code occasionally - for problems that are easily divided - and that works fine as a speed-up, but I'm still holding out for an example of a swarm of agents that's really compelling.

The reason I got excited about the Cursor FastRender example was that it seemed like the first genuine example of thousands of agents achieving something that couldn't be achieved in another way... and then embedding-shapes went and undermined it with 20,000 lines of single-agent Rust!

(I wonder what Simon thinks now of this, because from my skim of this article, they do seem to mention some tidbits about parallelism, Either way, I think that these projects are really adventurous and there is still something novel in all of this.)

(Edit: I have tried reading the blog post many times now but I am unable to understand how this is [working?] but something like cursor's project had turned to waste. Initially people were optimistic about cursor's project until emsh really showed them that it wasn't so good, I hope that this might not be the case here but still, I am just a bit confused as to why this in particular seems to work, kind of waiting for simon's post about it now :] )

pseudosavant 3 months ago

It is pretty incredible to me that in the pre-LLM/agent coding world, creating a new high-quality JS engine or browser seemed like it would likely never happen again. But now, any large company could build one if they wanted to. In a single digit number of months no less.

spoiler 3 months ago

There's many JS implementations out there. Quality kind depends on what you need, and there's some engines more or less complete in which quirks are supported.
And for example, v8 doesn't make much sense in embedded contexts
- pseudosavant 3 months ago
  
  There are definitely plenty of other JS engines, but they aren't always up to date on newer JS features. I'm pretty sure this is the 3rd JS engine to fully support the Temporal API (even JSC hasn't shipped it yet).
  - ivankra 3 months ago
    
    More like 8th. These pass nearly all Temporal tests as well: v8, spidermonkey, libjs, boa, escargot, kiesel, jint. Almost there: graaljs, yavashark.

Waterluvian 3 months ago

This is really cool to see and study. It’s a great experiment.

I think it doesn’t really say a lot though. The hard part, in my opinion, is not making a new engine, it’s making one that’s worth using and will remain so for a long time.

What I’d love to see next is how well (or poorly) this approach is at making the performance not terrible.

TheRealPomax 3 months ago

Pretty neat as a real-sized-project experiment to see what a programming program can actually do.

dmitrygr 3 months ago

Now do it without those pre-written tests. Spec only. Else, the writers of those tests deserve a LOT of credit.

pseudosavant 3 months ago

If there is one thing that that agents/LLMs have highlighted, it is how much credit those test writers do deserve. Teams that were already following a TDD-style approach seem to be able to realize value from agents most easily because of their tests.
The tests are what enable: building a brand new JS runtime that works, rewriting a complex piece of code in a different language (e.g. Golang instead of TypeScript) that is more performant for that task, or even migrating off of an old stack (.NET WebForms) to something newer.
ivankra 3 months ago

You can prompt an LLM to generate tests from the spec and I'd bet it would easily get most of the way there, especially if you give it a reference implementation to test against. I did just that, though on a small scale - just for feature tests. The last few percent would be the real challenge, you probably don't want it to just imitate another implementation's bugs.
- dmitrygr 3 months ago
  
  Reference implementation that someone else — a human, wrote? Hm… so one way or another, some humans’ labour is laundered…
  - ivankra 3 months ago
    
    Don't we all stand on the shoulders of giants?
    
    dmitrygr 3 months ago
    
    I have never attempted to take credit for someone's work, nor ever put serious effort into hiding someone's contribution. LLMs are purpose-designed for that.
UncleEntity 3 months ago

> Now do it without those pre-written tests
That's probably the most important thing, actually. I've tried my hardest to get Claude to build an APL VM using only the spec and it's virtually impossible to get full compliance as it takes too many shortcuts and makes too many assumptions. That's part of the challenge though, to see how far the daffy robots have come.
- vrighter 3 months ago
  
  Hehe I tried gving it a minesweeper CSP I've been working on and asked it to develop the feature I was working on at the moment just to compare. I was working on adding non chronological backtracking to the search engine.
  I gave it the proper compile flags, I gave it test cases and their expected output, and everything it would have needed. The test cases were specifically hand picked to be hard on the search algorithm. And the base program was correct and gave the correct results (I was only adding an optimization), and were what I was using as a baseline for testing my implementation. You know, with a debugger and breakpoints, printfs and all that.
  In the end it couldn't get the thing to work (I asked it to compile and verify) then it proudly declared that in all of the test cases I gave it, everything was solved through constraint propagation and the search didn't even trigger. So it didn't introduce any bugs. It tried to gaslight me. Even though it got a segfault in the new code it added (which would obviously not have been triggered if the search didn't actually execute)

Settings

JSSE: A JavaScript Engine Built by an Agent

Keyboard Shortcuts