Linux Running in a PDF
linux.doompdf.devDoesn't work, the document is unresponsive. I used a HP DeskJet 2820e Printer btw.
You will need to pipe it into the scanner in a loop, making sure to circle the correct keys before each scan
PDF actually has the ability to define which elements are displayed vs. printed (Optional Content).
1. Linux Running in a PDF (doompdf.dev)
114 points by theden 4 hours ago | flag | hide | 37 comments
2. Ingesting PDFs and why Gemini 2.0 changes everything (sergey.fyi)
978 points by serjester 17 hours ago | flag | hide | 323 comments
As people start bolting various kinds of PDF parsers and evaluators to LLMs, there's got to be some interesting hack potential.Linux running in a LLM, as a means to hijack computing resources to train an LLM... or mine bitcoins.
Is it able to have data come out of it though, or is it fully... "sandboxed"? I am guessing the only output is the visual feedback you get when it's rendered?
Oh... I guess if you can somehow have it trigger a "load an image with this query string" or something that could be a way to communicate with the rest of the world
PDFs have always been a highly attractive attack vector, because most people associate them purely with text and have no clue that you can easily embed executable code. Combine that with how atrociously many vulnerabilities there are in popular readers like acrobat, and you have a perfect gateway for getting your company hacked.
Converting all your PDF's to max-quality sized DJVU's (at least the ones without forms) would be the first thing to do in any company. Maybe not for graphic design because $ADOBE, but for documentation it's perfectly safe to do so.
Unfortunately that approach falls apart the moment you need to interact with anyone outside your company.
I received a spam/scam text yesterday with a PDF embedded in it. I deleted it immediately. I also emailed my clients to let remind them not to open them either.
PDF is brushing up against "more harm than good". Wish there was a proper alternative.
How about OpenXPS or DjVu?
Posted few days ago: https://news.ycombinator.com/item?id=42891937. Repo also provides some explanation/info on how the machinery works.
Finally! I've been making the joke "put Linux.js in a PDF so I can run Linux, inside a PDF, inside a browser, inside Linux, inside a PDF, inside a browser, inside Linux" for far too long...
not exactly. This PDF cannot open a browser... yet!
Using JS for this feels like cheating... I wonder if similar things would be possible with PostScript?
It's possible, but not in a PDF. PDFs support only a turing incomplete subset of PostScript, because PDF's designers thought that having a turing complete language in your document format would have performance implications. (Later, they changed their mind and added JavaScript support.)
At least PDFs are generally usable with JS disabled, and it's not available in popular variants like PDF/A and PDF/X
From the computation point of view, it's possible. PostScript has integer arithmetic operations needed for x86 CPU emulation. It also has mutable byte strings, which are useful as emulated memory.
PostScript it's Turing Complete. Get GhostScript, zmachine.ps and some game, such as calypso.z3. You can just ddg/google them freely.
With PostScript you have zmachine.ps which is a ZMachine (zork and friends) interpreter to play text adventure games.
And OFC there's a chess engine in PS, a tic tac toe, and with patience you could even play NES games, but you might need to play with the stack a lot.
Finally, I can `rm -rf /` in a PDF.
PostScript supports that via the "shredpage" operator.
Poetic
Copy 10 of these in an USB drive. Enjoy your mobile Kubernetes cluster.
At least three detections on VirusTotal, but I'm not sure if it's significant.
ClamAV: Js.Trojan.Obfus-48
Cylance: Unsafe
Google: Detected
Set pdfjs.enableScripting in Firefox about:config to false.
But is there a Linux PDF editor that runs Linux in a PDF? Evince isn't loading it for me...
The only place I can get it to run is in Chrome. Wont work in Adobe reader, Firefox, evince etc. Seems most people that do this 'coding in a PDF' only target chrome as a runtime.
Not sure if theres a reason for that like chrome allows more code execution within a document or something?
Does anyone know if running PDFs through the following filter (as in [0]) prevent malicious actors?
gs \
-dNOPAUSE \
-sDEVICE=pdfwrite \
-sOUTPUTFILE=clean.pdf \
-dBATCH \
dirty.pdf
[0]: https://tex.stackexchange.com/a/481609/29430It can make things worse: Ghostscript is not particularly safe to run on untrusted/potentially malicious input. It has a giant attack surface and no proper mitigations, unlike the PDF reader in your browser.
At a minimum, you'd have to sandbox it using something like gVisor.
Ahh, interesting...
How would you structure your workflow to protect from potentially malicious PDFs?
I had originally thought of setting up an inotifywait watcher that would look for downloaded PDFs to swap downloaded files (while leaving a *_with-risky-active-contents.pdf copy).
After thinking for a bit about your comment, I thought about creating a .desktop file that first cleans the PDF via `docker run --runtime=runsc -it ubuntu gs ...` that then proceeds to launch the viewer, and is associated as the main reader of PDF documents...
But now I am wondering if this should be integrated into clamav and other antivirus clients (and unblocking on a case-by-case basis).
GhostScript has -dSAFER as default since decades. If any, you can always use pdf2djvu to convert that PDF into a DJVU file. On the PDF in your browser... if it runs JS, you can get p0wned twice, even if it's sandboxes. Vuls in browsers are like segfaults with dubious codecs.
Who the hell keeps making those? First I saw Tetris, now a whole OS. Awesome!
The Reverend Pastor Manul Laphroaig at Alchemist Owl is responsible for a lot of this madness:
https://www.alchemistowl.org/pocorgtfo/
I confess to having become a fan long, long ago ..
PoC||GTFO is a great magazine :) Yay to Travis Goodspeed! Sorry, I wanted to say Pastor Laphroaig. Just don't get him started on his Tennessee buck belt :D
Linux in browser existed for a while. And if PDF can run JS then just put "Linux.js" in PDF. JS opens up a whole can of worms.
About time someone gets a somewhat intelligent LLM working in js too (I know it can be done now, but like the linux js, there's a very large difference between what existed and what is practical)
Missing from headline: It is a RISC-V VM.
VM? Awesome! Can we run a hypervisor in PDF!? proxmox.pdf :)
Right below this in my feed:
> Ingesting PDFs and why Gemini 2.0 changes everything
Be afraid.
Be very afraid.
Does it run doom? ;-)
That's Doom running in a PDF, parent is asking for Doom running via linux running in a PDF.
Doom already runs in a PDF
Better question, does it run DOOM Emacs
vi works!
Just because you can, doesn't mean you should....
Of course one should. One should always explore and satisfy curiosity.
What one shouldn't do is to use any of that for "serious" purpose, but that kind of stuff is apart of what makes computing great - boundaries are in the imagination.
Running operating systems inside of document formats is what makes computing great?
I thought it was the ease and leisure as well as the aid in pursuit of real knowledge they were supposed to introduce into our lives.
Can we spend our time more wisely instead?
what are you doing commenting here? go back to curing cancer!
That is not a reasonable comparison is it. Junk hyperbole doesn't make a strong point.
What are you trying to say? This insidious stuff is worthwhile? Bitcoin mining in PDFs is important stuff?
How is it not reasonable? Commenting here is clearly a total waste of life compared to curing cancer, one of the most noblest pursuits, and we've taken it upon ourselves to judge how other people spend their time. And of course once we've decided that I am the arbiter of what other people get to spend their time on, and I've decided that curing cancer is the most important thing to be working on, anyone not working on that gets sent right to waste of time jail.
Watching TV and playing sports or having a drink with friends, all banned under this regime because that's all unnecessary. I might also be possible that everyone's concept of what's valuable and productive is entirely subjective. Demonstrating that PDFs have JavaScript interpreters these days and are not a static content rendering system, and are not to be trusted, in a way that goes viral for maximum exposure, in the off chance someone learns something about how insecure they are, and they avoid getting hacked seems quite valuable to me, given how much money gets lost to hackers.
No one spends all their time only doing productive things, and some choose to spend it making things that other people find neat enough to share, like this PDF. Other people choose to spend their time commenting here. What else do you do for fun (when you're not off curing cancer)? Why is that any more worthy?
Why does it bother you how others spend their free time?
These "dynamic pdfs" are the anti-thesis of what pdf files are meant to be: static objects containing text that always looks the same. My state dept. of natural resources loves them which means all the regulations are now inaccessible. All I can see in the "pdfs" (not pdfs, pdf shells that are webpages) are the following lines,
"Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document."
... because they pull down their actual contents using JS from some website. They are the anti-thesis to what a pdf file is meant to be. Truly the worst of both worlds and a huge step backwards in accessibility and longevity. All that a screen reader can read is the above text.
Doesn't work in Evince or Firefox FWIW.
It says in the document itself that it only works in Chromium-based browsers.
It does, but the headline doesn't. I don't know if it's a nickpick too far, but it should be "X in a PDF in Chromium", as the hack seems as much about Chromium as it is about PDFs.
What kind of pdf feature works in chrome but not evince?
Insane ones, like hosting an OS.
The kinds of features that benefit Google and not necessarily the general public.
I have not updated my noscript pdf reader in years... The dreade update is coming up though: mupdf.
> Note: This PDF only works in Chromium-based browsers.
I hear that CSS is Turing complete.
Damn this is actually interesting
Has anyone tried printing it? I'm afraid!
You can open (and presumably print) it in MS Word, it's just a single page.
I'm also afraid of opening MS Word.
Taste is subjective, but this broke that barrier. It is just absolutely hideous. Good engineering work, but still a total abomination in itself.
Doesn't work in Safari (?).
What’s next? GPT in PDF?
ingest this, gemini
I would really appreciate if someone could put a decent PDF reader, like Sumatra, into a PDF so I could have a portable and good PDF reader on locked down computers.
Are you going to open that PDF in Adobe Reader?
If yes, Adobe has this friendly AI assistant forced into your face and overlapping floating toolbar on all sides of your document that you cannot get rid of to get a clean view of the document itself.
So your dream of a simple lightweight clutterfree PDF reader will remain a dream, unfortunately.
There is the classic Adobe Reader release [0] (non-DC) that is still receiving updates [1].
[0] https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNo...
[1] https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNo...
It's even served by FTP. Nice!
AI assistants are this generation's Clippy.
Most browsers can open PDFs but can’t necessarily search large docs quickly.
I mean if he’s searching for a good PDF reader I doubt he’ll be opening it in Adobe Reader
True but also, what actually is the best alternative that's also free? Non-rhetorical question, because I am averse to paying for what feels like a universal/commons piece of tech.
I'm not an advanced user of pdfs so I'm not sure if there's anything major missing from these, but either Okular or just the built-in browser ones works well enough for my basic needs of reading and the occasional form filling for the desktop, and on android I use muPdf.
Someone must put in the effort to make something. Should they not be compensated for their time?
If they inform and ask after the fact, compensation should be voluntary.
When I was a Mac user, I found the built in Preview adequate to my needs.
I had assumed it went the way of Adobe Flash many years ago.
It's still alive and kicking in anything government. Third party support for forms and other kinds of interactive stuff is ... lacking.
Well, you know what they say about assume...
Accidentally Turing complete?
it's wild how shitty and hostile adobe's pdf reader is as a product. If I was in the planning room I'd roast the product as unusually offensive to most sensibilities.
Then again, I'm in no way running a billion dollar successful software company so what do I know?
PDF, the true universal app platform. Please don't tell Microsoft.
PDF clusters: each page of the document is running an independent virtual machine, and they share a private network.
throw in a bunch of kubernetes just to make it more interesting.
this sounds like the beginning of a cyberattack....
You can run MuPDF in Wasm: https://news.ycombinator.com/item?id=40096113
Believe it or not, that's my use-case for MS Edge, it has surprisingly decent features (for editing too)
I actually hit this recently with a Google Docs generated PDF (print -> download) that wasn't rendering correctly in Chrome or Firefox, but did load as expected in Edge.
Sumatra has also a portable version. Doesn't that work for you?
If IT finds an exe file they go bonkers. If they find a PDF, who will even care?
Depends on how long they've been in IT. A good while back, exploits in adobe reader were so common that pdf files were a common malware vector.
Pray Microsoft Defender will be nice enough to not look at the PDF too closely.
It can use the pdf reader in the pdf to look closely if it wants to
Still you will need reader to open that pdf
The PDF reader is embedded in Linux within a PDF that is within a PDF.
"Yo dawg I heard you like pdfs so I put a pdf reader into your pdfs so that you can pdf while you pdf".