We Built a Video Rendering Engine by Lying to the Browser About What Time It Is

blog.replit.com

176 points by darshkpatel 4 days ago · 67 comments

Reader

yetihehe a day ago

About 20 years ago there was a similar problem with demoscene creations. It was hard to capture demos in realtime in all their glory. So one guy created a tool[1] that waited for a frame render and presented proper time to demo so that frames would be paced properly. "All popular ways of getting time into the program are wrapped aswell - timeGetTime, QueryPerformanceCounter, you name it. This is necessary so .kkapture can make the program think it runs at a fixed framerate (whatever you specified)."

[1] https://www.farbrausch.de/~fg/kkapture/

fnordian_slip a day ago

It's rather off-topic, but the linked blog is by the guy who made .kkrieger, the tiny first-person shooter (only 96kB) in the early 2000s. Though the website for it is now gone, as .theprodukkt doesn't exist anymore, apparently. Nice to see his other stuff, didn't think to look at the time.
- xnx a day ago
  
  .kkrieger https://news.ycombinator.com/item?id=14409210
  - jakub_g a day ago
    
    I remember kkrieger being impressively small but also requiring insane compute :) it would render at like 0.1 fps on my poor machine. (Aligns with this comment: https://news.ycombinator.com/item?id=14415567)
- arjie 19 hours ago
  
  This is actually fascinating. This led me to find that he works at RAD Game Tools and that Rad (who I know of because of the Bink video codec) is now owned by Epic Games. Well good for them. Everything about this is full nostalgia juice. Thank you for observing what you did because I see now he has a blog and stuff. Now I've got a new RSS feed.
future_crew_fan a day ago

> "one guy" sir, that's no way to refer to farbrausch
here is their Breakpoint 2007 demo, a 177 Kb executable including 3d assets and textures. https://www.youtube.com/watch?v=wqu_IpkOYBg
- yetihehe a day ago
  
  I know them since fr08 . But AFAIK kkapture was started and maintained mostly by ryg. And the demo you linked is my favorite from them.
whynotmaybe a day ago

It's still used in the gaming industry but fir the opposite, you can go faster than time.
Some enterprise software also have it, mainly for testing and they have lint tools that check that you never use Date.now()
jezzamon a day ago

Ha, I independently set that up for my own coding animations. Not as crazy as faking the whole system time though, that's cool!
calvinmorrison a day ago

Dont forget .gif webcam streams! Just keep sending new frames!
- jacquesm a day ago
  
  Indeed. The problem with that was that the browser would cache the whole bloody stream and that quickly led to issues. That's why we switched to JPEG, which also greatly improved the image quality over the GIF format, which really wasn't designed for dealing with camera generated images.
Aardwolf a day ago

Isn't another solution to capture the video signal to the monitor?
- AndriyKunitsyn a day ago
  
  What this capturing software also does is it lies to the demo program about the time that passed between the frames, so the demo makers don't even care about running in realtime, because for them, it's like running on a PC that's almost infinitely powerful.

chmod775 a day ago

I've done similar shenanigans before. That main loop is probably simplified? It won't work well with anything that uses timing primitives for debouncing (massively slowing such code down, only progressing with each frame). Also a setInterval with, say 5ms may not "look" the same when it's always 1000/fps milliseconds later instead (if you're capturing at 24fps/30fps, that would be a huge difference).

What you should do is put everything that was scheduled on a timeline (every setTimeout, setInterval, requestAnimationFrame), then "play" through it until you arrive at the next frame, rather than calling each setTimeout/setInterval callback only for each frame.

Also their main loop will let async code "escape" their control. You want to make sure the microtask queue is drained before actually capturing anything. If you don't care about performance, you can use something like await new Promise(resolve => setTimeout(resolve, 0)) for this (using the real setTimeout) before you capture your frame. Use the MessageChannel trick if you want to avoid the delay this causes.

For correctness you should also make sure to drain the queue before calling each of the setTimeout/setInterval callbacks.

I'm leaning towards that code being simplified, since they'd probably have noticed the breakage this causes. Or maybe, given that this is their business, their whole solution is vibe-coded and they have no idea why it's sometimes acting strange. Anyone taking bets?

xnx a day ago

Mentioned at the very end that this is based on https://github.com/Vinlic/WebVideoCreator

echoangle a day ago

Crazy that this approach seems to be the preferred way to do it. How hard would it be to implement the recording in the browser engine? There you could do it perfectly, right?

szmarczak a day ago

This is the correct solution. However you'd need someone that knows C++ well, knows Chrome internals, is familiar with video stuff, audio stuff, knows Chromium rendering pipeline, possibly some GPU APIs as well. That person would cost huge amounts of money due to the required knowledge and complexity.
And then you'd need to maintain the code so it works with future Chrome versions.
- andrewstuart a day ago
  
  I did all that. It was hard. I’m not an expert but fought my way though to make all that work. I didn’t get it perfect but I got it pretty good. There’s some weird challenges when you get that deep.
  Just as I got it working AI came along which made it pointless then I realised when I got to the bottom that you cannot do it perfectly because it’s not a deterministic renderer. So I called the project to an end.
  - Ajedi32 a day ago
    
    > AI came along which made it pointless
    What? What does AI have to do with anything here?
bob1029 a day ago

It can't be that hard...
https://source.chromium.org/chromium/chromium/src/+/main:com...
whazor a day ago

Don’t forget the requirement of not dropping frames under load. The browser engine might have assumed that requirement throughout the entire code base.
medi8r a day ago

You can screen share from browser so surely that API?
- SiempreViernes a day ago
  
  The purpose seems to be flashy demo videos to sell web-based tools, so rendering unrealistically smooth interactions is sort of the point.
  - johnpaulkiser a day ago
    
    Oh, I thought the purpose was to to build a "copy this saas" app?
    You give the agent a URL it records itself going through UX flows, give that video to a coding agent and you have quite a feature.

zeta0134 a day ago

Ha, my first thought is that I'd likely break this system. My page synchronizes its animation playback rate to an audio worklet, because I need to do both anyway, and some experimentation determined that syncing to audio resulted in smooth frame pacing across most browsers. This means that requestAnimationFrame has the very simple job of presenting the most recently rendered frame. It ignores the system time and, if there isn't a new frame to present yet, does nothing.

eviks 14 hours ago

Not a single before/after video?

NoahZuniga a day ago

This wouldn't work for CSS/svg animations?

geon 10 hours ago

That's what I thought, but the article says it does:
> The page behind that URL might use framer-motion, plain CSS animations [...]
And the code example does something with css:
`await seekCSSAnimations(currentTime); // sync CSS`

G_o_D a day ago

https://chromewebstore.google.com/detail/scroll-capture/egmh...

amelius a day ago

> The core issue is that browsers are real-time systems. They render frames when they can, skip frames under load, and tie animations to wall-clock time. If your screenshot takes 200ms but your animation expects 16ms frames, you get a stuttery, unwatchable mess.

But by faking the performance of your webpage, maybe you are lying to your potential users too?

ErroneousBosh a day ago

> But by faking the performance of your webpage, maybe you are lying to your potential users too?
I think you're missing the point of it a little. The "user" is someone who wants to watch a rendered video of the brower's display, but if it takes longer than one frame (where you read the word frame in this comment, think of a frame of video or film, not a browser "frame" like people used to make broken menus with) to actually draw the visual the browser will skip it.
Instead this appears to just tell the browser it's got plenty of time, keep drawing, and then capture the output when it's done.
It's not too different to how you'd do for example stop motion animation - you'd take a few minutes to pose each figure and set up the scene, trip the shutter, take a few more minutes to pose each figure for the next part of each movement, trip the shutter again, and so on. Say it took five minutes to set up and shoot each frame then one second of film would take an hour of solid work (assuming 12 frames per second, or "shooting on twos").
It's just saying "take all the time you want, show me it when it's done" and then worrying about making it into smooth video after the work is done.
- SiempreViernes a day ago
  
  > The "user" is someone who wants to watch a rendered video of the brower's display
  While such a person might indeed exist, I think the more common situation is a vendor showing a demo of how a website might work. In that situation the consumer wants a realistic depiction of someone interacting with the site. Though of course for the user of the video service it might be very useful if the video hides all manner of performance issues.
  - actionfromafar a day ago
    
    If the rendering machine is an anemic, cheap and overloaded VPS, it may also show performance issues which don't exist.

tosti a day ago

What a waste of time. Just hook up an hdmi recorder

brcmthrowaway 21 hours ago

Does anyone remember FRAPS?

soulofmischief a day ago

This post smells of LLM throughout. Not just the structure (many headings, bullet lists), but the phrasing as well. A few obvious examples:

- no special framework. No library buy-in. Just a URL

- Advance clock. Fire callbacks. Capture. Repeat. Every frame is deterministic, every time.

- We render dozens of frames that nobody will ever see, just to keep Chrome's compositor from going stale.

- The fundamental insight that you could monkey-patch browser time APIs ... is genuinely clever

- Where we diverged

The whole post is like this, but these examples stand out immediately. We haven't quite collectively put a name on this style of writing yet, but anyone who uses these tools daily knows how to spot it immediately.

I'm okay with using LLMs as editors and even drafters, but it's a sign of laziness and carelessness when your entire post feels written by an LLM and the voice isn't your own.

It feels inauthentic and companies like replit should consider the impact on their brand before just letting people write these kind of phoned-in blog posts. Especially after the catastrophe that was the Cloudflare Matrix incident (which they later "edited" and never owned up to).

And the lede is buried at the very end: This is just a vibe-coded modification of https://github.com/Vinlic/WebVideoCreator, and instead of making their changes open source since they're "standing on the shoulders of giants", the modifications are now proprietary.

In the end, being an AI company is no excuse for bad writing.

lccerina a day ago

Their whole product is about vibe-coding unmaintainable "apps", not surprised they put the same level of (dis)attention in their blog too.
Also yikes for the proprietary modifications. AI companies: "what's yours is mine, and what's mine is mine only"
roywiggins a day ago

Unfortunately, people seem to organically love this sort of writing, since at least one or two of these get to near the top half of the front page here every day.
I'm not even against using AI per se, but when something is obviously written in ChatGPTese I'm not going to read it if I don't have to.
geonic a day ago

Yes, this kind of writing is rampant on X. Once you know it's coming from an LLM (mostly ChatGPT in my opinion as it uses this style often) you can't unsee it. And that immediately makes me skip it.
zem a day ago

> - We render dozens of frames that nobody will ever see, just to keep Chrome's compositor from going stale
what's the issue with this one? it sounds like something I might write, tbh.
- soulofmischief 15 hours ago
  
  I don't have an issue with any particular form of writing, it's just that the current generation of LLMs often write this way and it's an indicator of possible LLM use.
  "We X, just to keep Y from Z" and its variations are a pattern I've seen come up a lot.
truetraveller a day ago

You forgot the first part. the famous x,y, and z: "by virtualizing time itself, patching key browser audio APIs, and waging war against headless Chrome's quirks.
- astrange 21 hours ago
  
  https://en.wikipedia.org/wiki/Isocolon#Tricolon
  - soulofmischief 5 hours ago
    
    Thanks, it's helpful to put a name to it.
- soulofmischief a day ago
  
  Yep, that's good one. "Virtualizing time itself" itself is such a dead giveaway. What a nonsensical phrase.
  - andrewstuart a day ago
    
    Virtual Time is a feature of Chrome to fast forward when rendering.
    See --virtual-time-budget
    https://peter.sh/experiments/chromium-command-line-switches/
    
    soulofmischief a day ago
    
    Yes, but "virtualizing time itself" as phrased is meant to be superfluous, LLMs do that kind of thing a lot. It makes it sound like some kind of mystical or novel approach even though the actual pattern is already common knowledge / explicitly supported.
    
    andrewstuart a day ago
    
    Yeah it does have that flowery turn of phrase.
    Lesson: if your going to do LLM assisted writing, say to it “make sure this has a distinct tone that consistent and clearly quite different”.

marxisttemp a day ago

The prose here reads like it was LLM-generated.

Short sentences. Plenty of newlines. Enumerate everything. Always.

macinjosh a day ago

The posts pointing out in every comments section that people now use AI tools for writing are getting really tiresome.
You are not clever for noticing and you are just filling up the comments section with useless noise.
Not every post needs to be a hand crafted literary masterpiece.
- dinkleberg 20 hours ago
  
  Similarly, the comments complaining about comments complaining about AI are growing quite tiresome.
- marxisttemp a day ago
  
  The posts made by AI tools for writing are getting really tiresome.
  You are not clever for using them and you are just filling up the submissions section with useless noise.
  Not every post needs to be.

andrewstuart a day ago

I did this a few years ago. The approach these guys are taking is kinda hacky compared to other better ways - and I've tried most of them.

It works but only in a limited way there's lots of problems and caveats that come up.

I dropped it in the end partly because of all the problems and edge cases, partly because its a solution looking for a problem an AI essentially wipes out any demand for generating video in browsers.

I ended up writing code that modified chromium and grabbed the frames directly from deep in the heartof the rendering system.

It was a big technical challenge and a lot of fun but as I say, fairly pointless.

And there are other solutions that are arguably better - like recording video with OBS / the GPU nvenc engine / with a hardware video capture dongle and there's other ways too that are purely software in Linux that work extremely well.

You can see some of the results I got from my work here:

https://www.youtube.com/watch?v=1Tac2EvogjE

https://www.youtube.com/watch?v=ZwqMdi-oMoo

https://www.youtube.com/watch?v=6GXts_yNl6s

https://www.youtube.com/watch?v=KzFngReJ4ZI

https://www.youtube.com/watch?v=LA6VWZcDANk

In the end if you want to capture browser video - use OBS or ffmpeg with nvenc or something - all the fancy footwork isn’t needed.

KeplerBoy a day ago

Using OBS won't make your sluggish animation seem buttery smooth though. This seems to be the point of replit's attempt here. Perfect frame pacing.
On top you could use that technique to record at frame-rates higher than native. There's no reason why you shouldn't be able to redraw a basic page with some animations at a few hundred fps.
virtualritz a day ago

> I dropped it in the end partly because of all the problems and edge cases, partly because its a solution looking for a problem an AI essentially wipes out any demand for generating video in browsers.
That is only because your view omits some other problems this solves/products this enables.
There is an incredible ecosystem of tools out the browser land, to create animation.
If you can capture frames from the browser you can render these animations as videos, with motion blur (render 2500 frame for a second of video, blend 100 frames each with a shutter function) to get 25fps with 100 motion blur samples (a number AfterEffects can't do, e.g).
- andrewstuart a day ago
  
  There’s a tiny, tiny market for people who would pay for this.
  Also you must understand that chrome is not a deterministic renderer. You cannot get the per frame control because it is fundamentally designed to get frames in front of the user fast.
  They did some work around the concept of virtual time a few years ago with this sort of thing in mind and eventually dropped it.
  - virtualritz a day ago
    
    > There’s a tiny, tiny market for people who would pay for this.
    Not sure what market you are talking about.
    What I was talking about: people pay for motion graphics. LLMs are excellent at creating motion graphics from/around browser technology ...
    Advertising is a huge market and motion graphics is everywhere in video/film-based advertising.
    > Also you must understand that chrome is not a deterministic renderer. You cannot get the per frame control because it is fundamentally designed to get frames in front of the user fast.
    It absoluetly deterministic if you control the input. There is no "add random number to X" in Chrome. The non-determinism is user inputs and time.
    I know this because the company I work for did extensive tests around this last year. I was one of the people working on that part.
    We looked into the same approach as replit. The only reason we gave up on it was product-related which changed our needs. Not because it is impossible (which, I guess, their blog post prooves).
    
    andrewstuart a day ago
    
    Not when you can say to nano banana “make a video showing a thousand monkeys running down a road all wearing suits, with cinema quality credits rolling over listing the ingredients of corn flakes”, and it spits out something amazing.
    
    virtualritz 5 hours ago
    
    You are sold on the Ai snake oil. There are no models that can generate art-directed VFX or motion graphics for multi-second clips without breaking consistency, missing the mark, fucking up the timing.
    An exact color from a customer's brand book? Complex animated typography? Forget it.
    Check out my GH profile and who I work for. We're ex blockbuster VFX professionals. We use Ai everywhere. We know what is possible and what isn't.
    The market we serve is huge. And every ad we create is bespoke. And the product the ads are for is unique.
    A scratch on a rim of the left front wheel? When we do a turntable that scratch needs to be in every frame and the rim can not change number of spokes or the like (that's what latest gen models like Nano Banana v2 still do).
    Show me a model that can do this level of detail. They barely manage now for a few seconds for special things like humans. Even there you may get subtle changes of eye or hair color. Anything that is not a human and needs to stay exacty the same each frame: good luck.
    But let's assume the models were there today.
    Still a dead end because: cost.
    You know how much it costs to create a 10-15 second clip with a state-of-art/somewhat useful model on a high VRAM GPU instance vs such a clip that is rendered by a headless Chrome browser on a cheap, low RAM, CPU spot-instance?
    We don't use Chrome (see previous post). We use a bespoke 2D/3D renderer with custom Ai/human-fed pipeline. But cricually, there are no final frames ever coming from Ai for now because of the reasons I mentioned at the top.
    We're talking multiple orders of magnitude difference in cost. As of this writing, a 15sec clip w. Veo 2 costs about 7.50 USD. This needs to be two orders of magnitude cheaper to become viable.
    tl;dr if we relied on Ai for this, our business would not exist.
pjc50 a day ago

"Use OBS" is one approach that definitely works. If you run the browser inside OBS it also disables hardware acceleration, which may cause some issues but has the advantage of turning DRM support off.
- andrewstuart a day ago
  
  No it doesn’t disable acceleration.
  Just use nvenc or intel or AMD hardware video capture.
andrewstuart a day ago

Here you go, capture browser video for $100……
https://www.amazon.com.au/AVerMedia-Streaming-Passthrough-Re...
Or use ffmpeg with nvenc it allows simultaneous capture of 12 sessions.
Toss away all the hard work futzing with the browser just put in one ffmpeg command.

d--b a day ago

This is super smart but doesn't seem very future-proof...

Settings

We Built a Video Rendering Engine by Lying to the Browser About What Time It Is

Keyboard Shortcuts