Settings

Theme

JSSE: A JavaScript Engine Built by an Agent

p.ocmatos.com

27 points by tilt 23 days ago · 18 comments

Reader

ivankra 23 days ago

They really did manage to benchmaxx test262 and beat everyone at it. In my testing (all engines with experimental flags in same conditions on full test262):

  99.5 jsse
  99.1 v8
  99.0 spidermonkey
  98.1 libjs
  97.4 escargot
  97.3 jint
  96.4 boa
  95.0 graaljs
  93.2 kiesel
  92.1 jsc
  82.8 quickjs
  82.5 quickjs-ng
  82.1 xs
  80.1 brimstone
  77.7 nova
  74.6 jerryscript
  66.5 sobek
  65.5 goja
It ain't fast (~10x slower than boa), but very compliant.
Imustaskforhelp 23 days ago

Great. the one-agent-one-human repository by embedding-shapes is certainly quite nice and I had tried to re-create the results (within golang) though and I had failed but even within that I feel like maybe I learnt a lot of things (Also got to talk with emsh on bsky!)

It will also be very interesting to read simonw's comments on all of this too (https://news.ycombinator.com/item?id=46779522#46786824) when he had said:-

No, I'm still waiting to see concrete evidence that the "swarms of parallel agents" thing is worthwhile. I use sub-agents in Claude Code occasionally - for problems that are easily divided - and that works fine as a speed-up, but I'm still holding out for an example of a swarm of agents that's really compelling.

The reason I got excited about the Cursor FastRender example was that it seemed like the first genuine example of thousands of agents achieving something that couldn't be achieved in another way... and then embedding-shapes went and undermined it with 20,000 lines of single-agent Rust!

(I wonder what Simon thinks now of this, because from my skim of this article, they do seem to mention some tidbits about parallelism, Either way, I think that these projects are really adventurous and there is still something novel in all of this.)

(Edit: I have tried reading the blog post many times now but I am unable to understand how this is [working?] but something like cursor's project had turned to waste. Initially people were optimistic about cursor's project until emsh really showed them that it wasn't so good, I hope that this might not be the case here but still, I am just a bit confused as to why this in particular seems to work, kind of waiting for simon's post about it now :] )

pseudosavant 23 days ago

It is pretty incredible to me that in the pre-LLM/agent coding world, creating a new high-quality JS engine or browser seemed like it would likely never happen again. But now, any large company could build one if they wanted to. In a single digit number of months no less.

  • spoiler 23 days ago

    There's many JS implementations out there. Quality kind depends on what you need, and there's some engines more or less complete in which quirks are supported.

    And for example, v8 doesn't make much sense in embedded contexts

    • pseudosavant 23 days ago

      There are definitely plenty of other JS engines, but they aren't always up to date on newer JS features. I'm pretty sure this is the 3rd JS engine to fully support the Temporal API (even JSC hasn't shipped it yet).

      • ivankra 23 days ago

        More like 8th. These pass nearly all Temporal tests as well: v8, spidermonkey, libjs, boa, escargot, kiesel, jint. Almost there: graaljs, yavashark.

Waterluvian 23 days ago

This is really cool to see and study. It’s a great experiment.

I think it doesn’t really say a lot though. The hard part, in my opinion, is not making a new engine, it’s making one that’s worth using and will remain so for a long time.

What I’d love to see next is how well (or poorly) this approach is at making the performance not terrible.

TheRealPomax 23 days ago

Pretty neat as a real-sized-project experiment to see what a programming program can actually do.

dmitrygr 23 days ago

Now do it without those pre-written tests. Spec only. Else, the writers of those tests deserve a LOT of credit.

  • pseudosavant 23 days ago

    If there is one thing that that agents/LLMs have highlighted, it is how much credit those test writers do deserve. Teams that were already following a TDD-style approach seem to be able to realize value from agents most easily because of their tests.

    The tests are what enable: building a brand new JS runtime that works, rewriting a complex piece of code in a different language (e.g. Golang instead of TypeScript) that is more performant for that task, or even migrating off of an old stack (.NET WebForms) to something newer.

  • ivankra 23 days ago

    You can prompt an LLM to generate tests from the spec and I'd bet it would easily get most of the way there, especially if you give it a reference implementation to test against. I did just that, though on a small scale - just for feature tests. The last few percent would be the real challenge, you probably don't want it to just imitate another implementation's bugs.

    • dmitrygr 23 days ago

      Reference implementation that someone else — a human, wrote? Hm… so one way or another, some humans’ labour is laundered…

      • ivankra 23 days ago

        Don't we all stand on the shoulders of giants?

        • dmitrygr 23 days ago

          I have never attempted to take credit for someone's work, nor ever put serious effort into hiding someone's contribution. LLMs are purpose-designed for that.

  • UncleEntity 23 days ago

    > Now do it without those pre-written tests

    That's probably the most important thing, actually. I've tried my hardest to get Claude to build an APL VM using only the spec and it's virtually impossible to get full compliance as it takes too many shortcuts and makes too many assumptions. That's part of the challenge though, to see how far the daffy robots have come.

    • vrighter 23 days ago

      Hehe I tried gving it a minesweeper CSP I've been working on and asked it to develop the feature I was working on at the moment just to compare. I was working on adding non chronological backtracking to the search engine.

      I gave it the proper compile flags, I gave it test cases and their expected output, and everything it would have needed. The test cases were specifically hand picked to be hard on the search algorithm. And the base program was correct and gave the correct results (I was only adding an optimization), and were what I was using as a baseline for testing my implementation. You know, with a debugger and breakpoints, printfs and all that.

      In the end it couldn't get the thing to work (I asked it to compile and verify) then it proudly declared that in all of the test cases I gave it, everything was solved through constraint propagation and the search didn't even trigger. So it didn't introduce any bugs. It tried to gaslight me. Even though it got a segfault in the new code it added (which would obviously not have been triggered if the search didn't actually execute)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection