Tell HN: Adobe took down the PDF 1.7 specification from their site
I just discovered that Adobe took down the PDF 1.7 specification from their site. It's used to be hosted at [1] and I can't find a replacement. Of course this doesn't mean that the specification can't be acquired freely from elsewhere [2, 3], but it's unfortunate if the authoritative source is down. Hopefully it is a mistake though and it will be back up.
[1] http://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
[2] https://christianhaider.de/dokuwiki/lib/exe/fetch.php?media=pdf:pdf32000_2008.pdf
[3] https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf It's still available at this Adobe URL: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/p... "As distributed by Adobe after adoption as ISO 32000-1:2008, with permission of ISO." [0] Not to mention ISO unsurprisingly host it, which I would also consider authoritative: https://www.iso.org/obp/ui/#iso:std:iso:32000:-1:ed-1:v1:en [0] https://www.loc.gov/preservation/digital/formats/fdd/fdd0002... Ah, thanks for the link. For some reason that link doesn't turn up in google search for me, however hard I try. Of course the ISO one is also authoritative, but not free. Oh, mea culpa. Honestly I just found the opening page and didn't notice it needed payment to get to the rest of it. I naively assumed that since Adobe had published it with ISO's permission, it was free in both places. No worries! How did you find the "opensource.adobe.com" link? I can't hit it with google, even with aggressive searches like "site:opensource.adobe.com filetype:pdf". And I can't seem to navigate there from the main page. The loc.gov page came up first in my search, and links to the Adobe one :) Apart from the PDF32000_2008.pdf (the ISO version), Adobe used to have a pdf_reference_1-7.pdf at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdf_ref... which is the version before it got ISO-ized. The ISO version is "substantially the same" except for typesetting and small differences in wording, but I found the Adobe version much more of a pleasure to read. Comparing them is a good exercise in how much these small differences matter. kinda funny you need a pdf reader to read the pdf specification :) Yeah, how did they bootstrap the first pdf reader? /s It must have been hard guessing at the spec until you could read it properly. Maybe they used Postscript? I somehow thought early versions of the PDF spec were published as a .ps version for that very reason, but my duck-fu is failing me finding any such link. It may require wayback-fu and that's beyond my level-of-effort :-) Its same with html spec I think. HTML is somewhat human readable in a text editor, but PDF likely is not. Actually a lot of the PDF format is plain text, but can contain binary streams. You can open a PDF in a text editor and see the header, and skip to the end and see the xref index and some other parts. The binary sections are enclosed in plain text start and end markers, but you probably won't be able to read much of the actual content this way since it will be compressed or encrypted. Sometimes even the binary fragments are compressed plain text. How did you find the link on opensource.adobe.com? The used to host other standards, too (e.g. font formats). The ISO released the 2.0 version of the specification that replaces the 1.7 standard. "Although it is an open standard, one major difference compared with prior versions of PDF is that ISO now holds the copyright to the PDF specification and thus PDF 2.0 is not freely downloadable." [0] It looks like DMCA requests are being issued to anyone that hosted the old specification, even open source projects [1]. [0] https://www.pdfa.org/resource/iso-32000-pdf/
[1] https://github.com/Hopding/pdf-lib#git-history-rewrite Wow, I feel like that is a step back. This feels a lot like other protocols non free specs like J1939 that is over $1000USD. PDF 2.0 is not cheap either: https://www.iso.org/standard/75839.html
Definitely a step back. Why are these documents a paid product? Are there other ways to access it? I figured standardization documentation would be free to encourage adoption. > This page was updated on 23 March 2022 as many direct links to legacy PDF specifications on adobe.com were broken. Many links now reference the Wayback Machine internet archive and thus may be slow. It's possible they're reworking their CMS and that causes files to be moved (breaking links everywhere). Microsoft loves doing that with their developer blogs. Not cool [0]. It's funny how CMSes tend to offer "clean URL" configurations (meaning that everything after the origin is 100% controlled by the CMS user) for requests served dynamically (database queries) but requests served statically (public files on disk) often end up containing implementation-specific junk (e.g., "/sites/" in the case of Drupal). The magic that makes clean dynamic URLs (rewrite everything that isn't a file to the boot script) should be expanded to make clean file URLs. Serving files would then need help from a script+db, but so what, that already happens for private files. Obviously embedded assets that need to be fast (images, stylesheets, scripts, etc.) can't have a slow db query in the way. I'm only talking about files that are a first-class destination in the browser's address bar, like PDFs, and anything where the disposition is that it lands in your Downloads folder. Stuff that might be a search result or otherwise linked-to. Drupal allow you to set private file mode, which has clean URL. It's kind of clean in that it uses a URL based on a db value instead of the filename on disk, but it's still got CMS-specific junk in that it always starts with "/system/" (at least in D7, I haven't explored it in D9). Off topic, but man is that document hard to use as a reference. Ironically, I wish they would publish it as HTML broken down by chapter and section. (I have used that document a lot to write a custom PDF generator and parser in Java, using a downloaded copy) > Ironically, I wish they would publish it as HTML broken down by chapter and section. I wish there was an EPUB version of the document. Do PDFs support reflowable content? I believe one of the selling points of PDFs was the absolute lack of reflowing content. Right, as the point is to represent a physical document, paper and ink (or canvas, toner, whatever -- stuff that doesn't reflow). Why anyone would use such a format for these situations, where the audience definitely cares way more about consuming it on an electronic device than printing it out, is... mind-boggling. Of course, AI+ML to the rescue: Liquid Mode [0]. > Files are processed in our secure data servers and immediately deleted from our servers after the experience is generated. [0] https://www.adobe.com/devnet-docs/acrobat/android/en/lmode.h... I've found people being precise about the flow of equations and text intermixed can be easier to read than reflowing content. Other than that, not so much. Edit: Non-reflowing content also works well if you need to refer people to page numbers and paragraphs. I look forward to playing with liquidmode at some point soon. CSS flow control and specifying an `id` attribute value as a URL fragment would be my solutions to those particular concerns, if it weren't the case that our context here is capturing from software that offers printing but doesn't offer exporting to HTML very well. I think the solution might be "bring it to a good web dev and have a solid punch list." A PDF can be reflowed without reconstructive processing only if a PDF was generated as a Tagged PDF [1] and if the viewer supports reflowing. [1]: Essentially a PDF with its own EPUB inside it, but unlike just having an attached EPUB, there is a map between the page layout of the PDF and the tags. There are implementations of reconstructive reflowing that infer the layout block structure and reading order and can reflow a two column paper into a single column. PDFs can support tables of contents with labeled chapters and sections. Not sure if the feature is standardized, but it's there. The specification does have a hierarchical outline, and you can click on cross references too. Of course navigation can still be cumbersome, linking to chapters can also be awkward (tip: right click on outline element and copy link works in Firefox). There are some problems of the spec though, and navigation is not the most pressing one. The spec is huge, support for less used parts is spotty in various PDF readers. It also has inaccuracies (not corrected in errata) and underspecified parts. > hard to use as a reference How so? I frequently reference specific sections, tables or pages of the spec at work. Maybe one of the side effects of this is that people only continue writing against PDF 1.7. Maybe they want to sell it :)