Self-updating screenshots

489 points by bjhess 24 days ago · 79 comments

Reader

Same, I've added a .#screenshots derivation. High up-front effort but almost zero maintenance afterwards.

Bonus: since you're generating screenshots programmatically anyway, you can generate a pair of each with your app's light/dark theme, and swap them in/out depending on prefers-color-scheme: dark. <picture> elements work in GitHub READMEs, too: https://github.com/CyberShadow/CyDo#readme

neobrain 23 days ago

+1 for this approach. For a mobile app, I made Nix spawn an ephemeral Android emulator instance for generating up-to-date screenshots, requiring no prior setup and leaving no lingering data around after running. Setting it up wasn't that high-effort in my case either; coming up with the idea was the hard part, the Nix code was one-shot by your favorite LLM.

Granted manually updating the screenshots isn't the most laborious task in the world, but the "upload-apk + take-screenshot + transfer-back-to-PC + edit" process is usually barely annoying enough that you end up almost never doing it otherwise (similar to the OP's experience in the closing paragraph).

Landing7610 21 days ago

That sounds so cool! Is the repo available anywhere?

neobrain 16 days ago

Nothing public yet, but this is the Nix output for taking the screenshot, to be executed via `nix run .#screenshot`:

        outputs.apps.x86_64.screenshot = {
          type = "app";
          program = toString (pkgs.writeShellScript "screenshot-script" ''
            set -euo pipefail

            EMU_SDK="${androidEmulatorComposition.androidsdk}/libexec/android-sdk"
            ADB="$EMU_SDK/platform-tools/adb"
            EMULATOR="$EMU_SDK/emulator/emulator"
            APK="${self.packages.${system}.debug}/myapp-debug.apk"

            SRC_DIR="$(${pkgs.git}/bin/git rev-parse --show-toplevel)"
            AVD_HOME="$(mktemp -d)"
            trap 'kill "$EMU_PID" 2>/dev/null; wait "$EMU_PID" 2>/dev/null; rm -rf "$AVD_HOME"' EXIT

            # Create AVD
            AVD_DIR="$AVD_HOME/screenshot.avd"
            mkdir -p "$AVD_DIR"
            cat > "$AVD_HOME/screenshot.ini" <<EOF
            avd.ini.encoding=UTF-8
            path=$AVD_DIR
            target=android-${platformVersion}
            EOF
            cat > "$AVD_DIR/config.ini" <<EOF
            AvdId=screenshot
            PlayStore.enabled=false
            abi.type=x86_64
            avd.ini.encoding=UTF-8
            hw.cpu.arch=x86_64
            hw.gpu.enabled=yes
            hw.gpu.mode=swiftshader_indirect
            hw.lcd.density=420
            hw.lcd.height=2400
            hw.lcd.width=1080
            hw.ramSize=2048
            image.sysdir.1=system-images/android-${platformVersion}/google_apis/x86_64/
            skin.dynamic=yes
            tag.display=Google APIs
            tag.id=google_apis
            disk.dataPartition.size=2G
            EOF

            echo "==> Starting emulator..."
            ANDROID_AVD_HOME="$AVD_HOME" ANDROID_HOME="$EMU_SDK" \
              "$EMULATOR" -avd screenshot -no-window -no-audio -no-boot-anim \
              -gpu swiftshader_indirect -no-snapshot 2>&1 &
            EMU_PID=$!

            echo "==> Waiting for boot..."
            for i in $(seq 1 90); do
              BOOT=$("$ADB" shell getprop sys.boot_completed 2>/dev/null | tr -d '\r') || true
              if [ "$BOOT" = "1" ]; then
                echo "    Booted after ~$((i * 2))s"
                break
              fi
              sleep 2
            done
            if [ "$BOOT" != "1" ]; then
              echo "ERROR: Emulator failed to boot" >&2
              exit 1
            fi

            # Enable dark mode
            "$ADB" shell cmd uimode night yes

            # Install and launch
            echo "==> Installing APK..."
            "$ADB" install -r "$APK"
            "$ADB" shell pm grant com.me.myapp android.permission.WRITE_SECURE_SETTINGS
            "$ADB" shell am start -n com.me.myapp/.MainActivity
            sleep 3

            # Navigate to settings screen by tapping "Notification Filters" button
            # This uses uiautomator to find the button by text for robustness
            "$ADB" shell uiautomator dump /sdcard/ui.xml
            BOUNDS=$("$ADB" shell cat /sdcard/ui.xml \
              | ${pkgs.gnugrep}/bin/grep -oP 'text="Notification Filters"[^>]*bounds="\K[^"]+' \
              || true)
            if [ -z "$BOUNDS" ]; then
              echo "ERROR: Could not find Notification Filters button" >&2
              exit 1
            fi
            # Parse bounds "[x1,y1][x2,y2]" to compute center tap coordinates
            X1=$(echo "$BOUNDS" | ${pkgs.gnused}/bin/sed 's/\[\([0-9]*\),\([0-9]*\)\]\[\([0-9]*\),\([0-9]*\)\]/\1/')
            Y1=$(echo "$BOUNDS" | ${pkgs.gnused}/bin/sed 's/\[\([0-9]*\),\([0-9]*\)\]\[\([0-9]*\),\([0-9]*\)\]/\2/')
            X2=$(echo "$BOUNDS" | ${pkgs.gnused}/bin/sed 's/\[\([0-9]*\),\([0-9]*\)\]\[\([0-9]*\),\([0-9]*\)\]/\3/')
            Y2=$(echo "$BOUNDS" | ${pkgs.gnused}/bin/sed 's/\[\([0-9]*\),\([0-9]*\)\]\[\([0-9]*\),\([0-9]*\)\]/\4/')
            TAP_X=$(( (X1 + X2) / 2 ))
            TAP_Y=$(( (Y1 + Y2) / 2 ))
            "$ADB" shell input tap "$TAP_X" "$TAP_Y"
            sleep 2

            # Capture and process screenshot
            echo "==> Capturing screenshot..."
            "$ADB" shell screencap -p /sdcard/screenshot.png
            "$ADB" pull /sdcard/screenshot.png "$AVD_HOME/raw.png"

            # Crop to content: remove status bar (top 128px) and empty space below
            # Per-App Overrides, then resize with high-quality Lanczos filter
            ${pkgs.imagemagick}/bin/magick "$AVD_HOME/raw.png" \
              -crop 1080x1100+0+128 +repage \
              -filter Lanczos -resize 540x \
              "$SRC_DIR/fastlane/metadata/android/en-US/images/phoneScreenshots/settings.png"

            echo "==> Screenshot saved to fastlane/metadata/android/en-US/images/phoneScreenshots/settings.png"
          '');
        };

9dev 22 days ago

The <picture> in README trick works like magic. Thank you! I'm going to steal it.

Nashooo 23 days ago

Hey, you need to make your code examples horizontal scrollable on mobile! I could still guess their content based on context clues but still.

amiga386 23 days ago

If the author is reading this, please note your code blocks don't scroll (and in fact overflow the white text onto the white background) on mobile layouts. You need an "overflow-x: scroll" or such.

spuz 22 days ago

The only problem with this idea I can forsee is that the application and therefore the screenshots can change but the documentation does not. For example, if the documentation says press "Options > Customize" but the application is updated so this becomes "Preferences > Advanced" then the screenshot will show the new text but the documentation will still show the old labels. This would be very confusing as it would be hard to correlate what is being shown on the screenshot with the text. If the user saw the old screenshot they could more easily identify that they were looking at an out of date documentation.

Having said that, have a process to automatically grab screenshots is going to make it significantly easier for a developer to update the docs so the motivation to keep the text up to date is going to be much higher.

zffr 22 days ago

As a next step, it could be cool to write unit tests against these screenshots that look for words like you mentioned. That way if a screenshot is updated and a test breaks you will know what documentation to update

furyofantares 23 days ago

Very cool.

For the small casual games I've been vibe coding, I always start from a place where the application has a CLI where it can run headless, rendering to offscreen texture, with a a screenshot command as well as performance instrumentation. It takes no time to include all this, and gives the agent a way to automate the ui and inspect important things. It also lets me trivially have the agent update screenshots.

Not as neat as being part of the build process, but I will now add that.

sho_hn 23 days ago

I do the same :-)
I have an offscreen screenshot path, as well as a CLI arg for world pos/camera view vector, and scripted benchmark runs with a simple text-based input format that has rows of named segments of n game ticks length with control inputs per segment. Use that extensively for A/B testing of visuals and performance while working on the game code.
vidarh 22 days ago

I was toying with a DragonRuby game a while back, and did something like that. But DR also comes with recording reproducible playbacks, screenshotting etc. built in, couple hot reloading and easily being able to inject code into the running game, and it was great putting in place instructions so the agent could run the game fully and show off things for me in addition to allowing it to test things. I think we'll see more and more frameworks built to enable this - it's nice for human development, but it really pays off when you're working with an agent to have everything nicely runnable from a CLI and fully introspectible.
- caspar 22 days ago
  
  This is also what I've done for my multiplayer falling sand game: it's very much not vibe coded (too performance-sensitive), but coding agents can launch the game on my steamdeck and run benchmarks, take captures & verify rendering is bit-for-bit identical on a given machine, etc.
  Agent can't _play_ the game yet, but that's on my list to experiment with.
_fzslm 23 days ago

Would you mind sharing a link to some of these casual games? I ask cuz I'm also interested in how vibe coding can make game development easier.
We had such a vibrant indie game scene when Adobe flash was about and since then nothing's really touched that level of ease of development. I think vibe coding is the first tool that actually exceeds it.
- furyofantares 22 days ago
  
  Unfortunately I can't right now, I'm going to release a few things simultaneously but they aren't public yet. They will eventually show up on https://kellydornhaus.com
- ryanjshaw 23 days ago
  
  Search for #vibejam on X, there’s a contest running right now with lots of people sharing their dev experiences.
  - crefiz 23 days ago
    
    Since you cannot seem to be able to share the url: https://x.com/search?q=%23vibejam&src=typed_query
    And for those of you: https://XCancel.com/search?q=%23vibejam&src=typed_query
avaer 23 days ago

> It takes no time to include all this
In some cases it does. Which engine?
- furyofantares 22 days ago
  
  I don't use an engine for vibing, they tend to be built around managing content and using their editors. You can drive engines from code but they are more built for invoking code from content usually. So I just use frameworks like SDL, Raylib C# and ebitengine. Most stuff I've done recently was ebitengine because I felt golang was the best thing to have LLMs writing when I started them.
  Right now I have my own framework which has a host written in Rust but game code is written in AssemblyScript - too early to tell how well this will work out but it is very promising to me right now.
  If I were just getting started I would probably pick some framework in Rust, or maybe Bevy which I believe is considered an engine but is code-centered.
  - furyofantares 22 days ago
    
    Actually, if I were just getting started I might pick an engine at this point. It's probably great at Godot now, and maybe with MCP servers can work on any other engine fine by having access to the editor.

merelysounds 23 days ago

This is very useful in mobile projects.

App stores require screenshots, but generating N images for NUMBER_OF_SCREEN_SIZES times NUMBER_OF_LOCALIZATIONS can be a chore.

In the past I wrote my own scripts for that, today tools like Fastlane[1] help.

I use Fastlane for my logic puzzle game Nonoverse[2], you can see sample screenshots in its App Store page.

I also automated App Preview video recording, complete with multiple scenes. If anyone wants to read more let me know, perhaps this is a good topic for an article.

[1]: https://fastlane.tools/

[2]: https://apps.apple.com/us/app/nonoverse-nonogram-puzzles/id6...

jdnebdbd 23 days ago

That sounds enticing! I can't figure out if it's a paid service or a local OS application though
- merelysounds 23 days ago
  
  Fastlane is a local, open source CLI tool.
  > 100% open source under the MIT license
  See: https://docs.fastlane.tools/
  It doesn’t support App Preview automation, this is something that I had to script myself.
wahnfrieden 22 days ago

These days it can be much easier to(though costlier) to use an agent skill.

boxed 23 days ago

For web projects, consider not doing screenshots at all and just embedding the html: https://kodare.net/2025/01/14/iframes-not-screenshots.html

You can get responsive design in "screenshots" with this. Super nice, and people can copy paste, look at the code (useful for dev tools), etc.

LeoDaVibeci 23 days ago

I've needed this so many times. BTW this should be a meme: "I think this might be the neatest thing I’ve built in X that nobody will ever notice."

borplk 23 days ago

Site appears to be down intermittently with a Django error

If author sees this: Turn off Django debug mode

irishcoffee 23 days ago

I wrote a gui app once that ran on a safety-critical platform. I ended up stuffing a rendering of the gui (rendered offscreen) into shmem at I think 24hz, and rendered that screenshot into the safety critical application. I passed clicks (no typing for this gui) back from the statically rendered image updating on a cadence, to the offscreen GUI.

Worked well. Not quite the same as this, but that’s what this reminds me of.

yjftsjthsd-h 23 days ago

I don't think I follow. What is that giving you that you wouldn't get by just having the user click in the application and see its real interface directly? Or are you saying you were embedding one application inside another?
- jfim 23 days ago
  
  My guess is that it's to ensure that the UI logic crashing or hanging doesn't bring down the safety critical process.
  - irishcoffee 22 days ago
    
    The rendering of the safety-critical application was written completely in C using OpelGLSC (https://www.khronos.org/openglsc/) to render the GUI, and had to pass a formal validation suite (MISRA was the big one IIRC). Simply put, the safety critical application essentially was not allowed to "fail in an unsafe manner" in the DO-178 sense. Using javascript, or some c++ gui library was very much out of the question.
    Fortunately, this was not an airborne platform, so failing safely was much simpler than what a true aviation stack or medical stack would need to do.

schneems 23 days ago

This is neat. I wrote https://github.com/zombocom/rundoc. It has a similar feature. The main driver is to produce tutorials so it also puts the output of commands run back in the document.

markaius 22 days ago

This reminds me of an small app I wrote and didn't do anything with-- You would choose an API / endpoint to grab data from, then choose how to display that text on a prebuilt or custom uploaded picture. On the ping it would rebuild the image and put the updated text from the api on it so you could have images with more current information (every 5/10/30 mins, etc.) Updating readmes with new screenshots on push is probably a much less cpu and better way to do dynamic image generation, for this use case.

I never ended up using the idea the way I wanted, but this makes me think there's potential in this dynamic image domain yet!

sublinear 23 days ago

Why wouldn't you want to version the screenshots along with the text? That's a feature, not a bug.

At best, this seems to require an unpublished draft state for all automatic screenshot updates until explicitly approved so that mistakes don't leak out to everyone else.

At worst, this is an unrealistic level of discipline to keep things in sync that is far greater than just updating the docs normally with the next major version release.

My alternative suggestion would be to make sure your test suite takes screenshots with every build. They're already perfectly organized and in the context of what you're documenting.

kalb_almas 23 days ago

I'm sometimes getting

NoMethodError at /self-updating-screenshots undefined method `name' for nil:NilClass

Ruby title-for: in handle, line 12 Web GET interblah.net/self-updating-screenshots

followed by a very detailed traceback when I try to access the page

Mackser 23 days ago

Super cool! Love that you can declare the screenshots inline in the markdown document.

For my desktop app I created a solution that generates screenshots in multiple languages, light/dark mode, removes noise and adds Windows/macOS window frames.

Wrote about it here: https://maxschmitt.me/posts/cakedesk-website-redesign#screen...

It's currently a separate script (which is a pain to maintain). I should look into making it a part of the markdown/mdx. Thanks for the inspiration!

taspeotis 23 days ago

I’ve wondered about doing screenshots from the e2e test run, even keeping docs/ all together in the same repo so when you update the documentation and need a new screenshot you add a new test

immanuwell 24 days ago

nice, embedding the capture instructions right in the markdown as comments is a dead-simple solution that'll age way better than any fancy external tooling

oneeyedpigeon 22 days ago

Except that it's brittle—HTML comments can be stripped. I wonder why they chose this approach vs. frontmatter.

martylamb 22 days ago

Very nice. I did something in a similar spirit eons ago for capturing CLI tool output directly into my docbook-based documentation. In my case it was also part of a build, generating intermediate .xml prior to running the docbook build.

Ancient historical reference: https://martiansoftware.com/lab/rundoc

Biganon 23 days ago

You should set DEBUG=False in your Django settings.

amelius 22 days ago

Speaking of screenshots.

Can we please agree that the OS should not send any event to applications while a screenshot is being made?

It is very annoying if you press a screenshot button and suddenly menus disappear. Or much worse, the application sends a "screenshot taken" message back to the social media platform.

reddalo 22 days ago

I also can't stand Android preventing me from taking a screenshot. It's on my screen, I have the right to take a screenshot.
I understand the technical limit of taking screenshots of DRM-protected content (e.g. Netflix), but why would my bank app be allowed to stop me from taking screenshots?
- hebelehubele 22 days ago
  
  You can mod the APK and remove those detections using ReVanced Manager
  https://github.com/revanced/revanced-manager
  https://gitlab.com/ReVanced/revanced-patches/-/blob/main/ext...
- pwdisswordfishs 22 days ago
  
  Solution: don't use mobile bank apps.
  - reddalo 22 days ago
    
    I'm forced to use a bank app in order to authenticate, even if I want to login on the desktop website. I think it's because of an EU regulation for strong authentication.
    
    dgellow 22 days ago
    
    The implementation causing you issues is from the bank, not from the EU regulation.
    See https://en.wikipedia.org/wiki/Strong_customer_authentication for details
- Ylpertnodi 22 days ago
  
  Ask them?
cbm-vic-20 22 days ago

The MacOS built-in screenshot tool has an optional "timed delay" feature, where you can click "screenshot in 5 seconds". With that time, you can open menus, or do anything that requires events to be processed by the application. Very handy for screenshots that require something to be clicked on.
- amelius 22 days ago
  
  I mean, I can probably do the same in X11 using xwd, with a sleep.
  But I just don't want my screenshot button to do anything else than taking a screenshot.

maderalabs 23 days ago

Nice! I actually started to build this exact thing a couple years back, and ended up abstracting it out to something more generic with https://picshift.io/. That said, I still love the screenshot use case - the original name of this project was ScreenSync ;)

Barbing 23 days ago

Neat, good job, and good to have these different approaches out there

efortis 24 days ago

same here, but linking to the screenshots used for pixel diffing, which get committed to the repo.

https://github.com/ericfortis/mockaton/tree/main/pixaton-tes...

cocoto 23 days ago

Wouldn’t a real live render approach work in this case? Have a live preview of your tool inside a rectangle. If the tool is light it should be optimal visually: it will respect browser rendering settings like accessibility parameters or custom addons.

boxed 23 days ago

Or just statically build the HTML. That's what we do for iommi docs: https://kodare.net/2025/01/14/iframes-not-screenshots.html
kristopherleads 22 days ago

Don't we already have this in HTML5?
npodbielski 23 days ago

Also it would be a security issue?

npodbielski 23 days ago

I do not know why but looking at the title I was sure that this involves something like webserver that updates static file it serves by some external webhook.

ekjhgkejhgk 23 days ago

> Your users might not notice, but you know, and it gnaws at you.

The users WILL DEFINITELY notice if the screenshots don't match what they have in front of their eyes.

xp84 23 days ago

Bravo. This is incredibly useful, and really improves the quality of documentation, especially for many applications whose design and UI are always in flux.

est 23 days ago

I maintain an internal wiki, the contents were generated by each CI/CD and always reflects from latest running code.

bobek 23 days ago

Plus we had a visual diff on the top of that as a part of the CI pipeline. It prevented a bunch of mishaps ;)

willm 23 days ago

I approve of this approach.

The docs for Textual (TUI library for Python) build screenshots along with the docs. Technically not really screenshots, they are SVGs, but principle is the same. They never get out of date.

https://textual.textualize.io/widgets/markdown/#example

davidtio 23 days ago

Interesting app, definitely will reduce a lot of work updating documentation.

infogulch 22 days ago

Can you do the same thing for websites with playwright/selenium?

3eb7988a1663 23 days ago

shot-scraper is another project in this vein.

https://github.com/simonw/shot-scraper

devmor 23 days ago

Really love this, it should be standard practice!

dhruv3006 23 days ago

This is very cool - I think I will try having this in https://voiden.md/.

Diti 23 days ago

“Crafted with care”, but the website has all the telltale signs of LLM slop.

erikmay 23 days ago

Awesome! Now you could even go a step further and add satori to the pipeline to add content to the the fresh screenshot. This way annotation could be easily added to the screenshot.

Xmd5a 23 days ago

> Then you change the UI slightly – tweak a colour, move a button, update some copy – and suddenly every screenshot that includes that element is stale. You know they’re stale. Your users might not notice, but you know, and it gnaws at you.

Related: Sabotaging projects by overthinking, scope creep, and structural diffing – https://news.ycombinator.com/item?id=47890799

cluckindan 23 days ago

Read the article you’re linking to, it is not relevant here.
- Xmd5a 23 days ago
  
  Of course it is, he managed to avoid this pitfall, I "press F to pay respect"
  - spuz 22 days ago
    
    "F" usually means somebody did something wrong and you are paying respect to their memory. You don't say it as a form of congratulations.
    
    Xmd5a 22 days ago
    
    > did something wrong
    Nah it's for those who sacrificed their own life, those who succumbed to the call of duty (or to the imperium of perfection) and put their teammates first.

Settings

Self-updating screenshots

Keyboard Shortcuts