Tell HN: Adobe took down the PDF 1.7 specification from their site

114 points by steerablesafe 4 years ago · 36 comments · 1 min read

I just discovered that Adobe took down the PDF 1.7 specification from their site. It's used to be hosted at [1] and I can't find a replacement. Of course this doesn't mean that the specification can't be acquired freely from elsewhere [2, 3], but it's unfortunate if the authoritative source is down. Hopefully it is a mistake though and it will be back up.

[1] http://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

[2] https://christianhaider.de/dokuwiki/lib/exe/fetch.php?media=pdf:pdf32000_2008.pdf

[3] https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

darrenf 4 years ago

It's still available at this Adobe URL: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/p... "As distributed by Adobe after adoption as ISO 32000-1:2008, with permission of ISO." [0]

Not to mention ISO unsurprisingly host it, which I would also consider authoritative: https://www.iso.org/obp/ui/#iso:std:iso:32000:-1:ed-1:v1:en

[0] https://www.loc.gov/preservation/digital/formats/fdd/fdd0002...

steerablesafeOP 4 years ago

Ah, thanks for the link. For some reason that link doesn't turn up in google search for me, however hard I try. Of course the ISO one is also authoritative, but not free.
- darrenf 4 years ago
  
  Oh, mea culpa. Honestly I just found the opening page and didn't notice it needed payment to get to the rest of it. I naively assumed that since Adobe had published it with ISO's permission, it was free in both places.
  - steerablesafeOP 4 years ago
    
    No worries! How did you find the "opensource.adobe.com" link? I can't hit it with google, even with aggressive searches like "site:opensource.adobe.com filetype:pdf". And I can't seem to navigate there from the main page.
    
    darrenf 4 years ago
    
    The loc.gov page came up first in my search, and links to the Adobe one :)
svat 4 years ago

Apart from the PDF32000_2008.pdf (the ISO version), Adobe used to have a pdf_reference_1-7.pdf at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdf_ref... which is the version before it got ISO-ized. The ISO version is "substantially the same" except for typesetting and small differences in wording, but I found the Adobe version much more of a pleasure to read. Comparing them is a good exercise in how much these small differences matter.
Anunayj 4 years ago

kinda funny you need a pdf reader to read the pdf specification :)
- Cerium 4 years ago
  
  Yeah, how did they bootstrap the first pdf reader? /s
  It must have been hard guessing at the spec until you could read it properly.
  - cwt137 4 years ago
    
    Maybe they used Postscript?
- mdaniel 4 years ago
  
  I somehow thought early versions of the PDF spec were published as a .ps version for that very reason, but my duck-fu is failing me finding any such link. It may require wayback-fu and that's beyond my level-of-effort :-)
- geodel 4 years ago
  
  Its same with html spec I think.
  - zdw 4 years ago
    
    HTML is somewhat human readable in a text editor, but PDF likely is not.
    
    yardshop 4 years ago
    
    Actually a lot of the PDF format is plain text, but can contain binary streams. You can open a PDF in a text editor and see the header, and skip to the end and see the xref index and some other parts. The binary sections are enclosed in plain text start and end markers, but you probably won't be able to read much of the actual content this way since it will be compressed or encrypted.
    
    pajko 4 years ago
    
    Sometimes even the binary fragments are compressed plain text.
pointlessone 4 years ago

How did you find the link on opensource.adobe.com? The used to host other standards, too (e.g. font formats).

iceblockderby 4 years ago

The ISO released the 2.0 version of the specification that replaces the 1.7 standard.

"Although it is an open standard, one major difference compared with prior versions of PDF is that ISO now holds the copyright to the PDF specification and thus PDF 2.0 is not freely downloadable." [0]

It looks like DMCA requests are being issued to anyone that hosted the old specification, even open source projects [1].

[0] https://www.pdfa.org/resource/iso-32000-pdf/ [1] https://github.com/Hopding/pdf-lib#git-history-rewrite

mr337 4 years ago

Wow, I feel like that is a step back. This feels a lot like other protocols non free specs like J1939 that is over $1000USD.
- prima-facie 4 years ago
  
  PDF 2.0 is not cheap either: https://www.iso.org/standard/75839.html Definitely a step back.
  - hoofedear 4 years ago
    
    Why are these documents a paid product? Are there other ways to access it? I figured standardization documentation would be free to encourage adoption.

dorianmariefr 4 years ago

> This page was updated on 23 March 2022 as many direct links to legacy PDF specifications on adobe.com were broken. Many links now reference the Wayback Machine internet archive and thus may be slow.

https://www.pdfa.org/resource/pdf-specification-index/

colejohnson66 4 years ago

It's possible they're reworking their CMS and that causes files to be moved (breaking links everywhere). Microsoft loves doing that with their developer blogs.

hunter2_ 4 years ago

Not cool [0].
It's funny how CMSes tend to offer "clean URL" configurations (meaning that everything after the origin is 100% controlled by the CMS user) for requests served dynamically (database queries) but requests served statically (public files on disk) often end up containing implementation-specific junk (e.g., "/sites/" in the case of Drupal). The magic that makes clean dynamic URLs (rewrite everything that isn't a file to the boot script) should be expanded to make clean file URLs. Serving files would then need help from a script+db, but so what, that already happens for private files.
Obviously embedded assets that need to be fast (images, stylesheets, scripts, etc.) can't have a slow db query in the way. I'm only talking about files that are a first-class destination in the browser's address bar, like PDFs, and anything where the disposition is that it lands in your Downloads folder. Stuff that might be a search result or otherwise linked-to.
[0] https://www.w3.org/Provider/Style/URI
- innocenat 4 years ago
  
  Drupal allow you to set private file mode, which has clean URL.
  - hunter2_ 4 years ago
    
    It's kind of clean in that it uses a URL based on a db value instead of the filename on disk, but it's still got CMS-specific junk in that it always starts with "/system/" (at least in D7, I haven't explored it in D9).

jeffreportmill1 4 years ago

Off topic, but man is that document hard to use as a reference. Ironically, I wish they would publish it as HTML broken down by chapter and section.

(I have used that document a lot to write a custom PDF generator and parser in Java, using a downloaded copy)

fivea 4 years ago

> Ironically, I wish they would publish it as HTML broken down by chapter and section.
I wish there was an EPUB version of the document. Do PDFs support reflowable content?
- HWR_14 4 years ago
  
  I believe one of the selling points of PDFs was the absolute lack of reflowing content.
  - hunter2_ 4 years ago
    
    Right, as the point is to represent a physical document, paper and ink (or canvas, toner, whatever -- stuff that doesn't reflow).
    Why anyone would use such a format for these situations, where the audience definitely cares way more about consuming it on an electronic device than printing it out, is... mind-boggling.
    Of course, AI+ML to the rescue: Liquid Mode [0].
    > Files are processed in our secure data servers and immediately deleted from our servers after the experience is generated.
    [0] https://www.adobe.com/devnet-docs/acrobat/android/en/lmode.h...
    
    HWR_14 4 years ago
    
    I've found people being precise about the flow of equations and text intermixed can be easier to read than reflowing content. Other than that, not so much.
    Edit: Non-reflowing content also works well if you need to refer people to page numbers and paragraphs.
    I look forward to playing with liquidmode at some point soon.
    
    hunter2_ 4 years ago
    
    CSS flow control and specifying an `id` attribute value as a URL fragment would be my solutions to those particular concerns, if it weren't the case that our context here is capturing from software that offers printing but doesn't offer exporting to HTML very well. I think the solution might be "bring it to a good web dev and have a solid punch list."
- compressedgas 4 years ago
  
  A PDF can be reflowed without reconstructive processing only if a PDF was generated as a Tagged PDF [1] and if the viewer supports reflowing.
  [1]: Essentially a PDF with its own EPUB inside it, but unlike just having an attached EPUB, there is a map between the page layout of the PDF and the tags.
  There are implementations of reconstructive reflowing that infer the layout block structure and reading order and can reflow a two column paper into a single column.
zozbot234 4 years ago

PDFs can support tables of contents with labeled chapters and sections. Not sure if the feature is standardized, but it's there.
- steerablesafeOP 4 years ago
  
  The specification does have a hierarchical outline, and you can click on cross references too. Of course navigation can still be cumbersome, linking to chapters can also be awkward (tip: right click on outline element and copy link works in Firefox).
  There are some problems of the spec though, and navigation is not the most pressing one. The spec is huge, support for less used parts is spotty in various PDF readers. It also has inaccuracies (not corrected in errata) and underspecified parts.
layer8 4 years ago

> hard to use as a reference
How so? I frequently reference specific sections, tables or pages of the spec at work.

andrewmcwatters 4 years ago

Maybe one of the side effects of this is that people only continue writing against PDF 1.7.

hulitu 4 years ago

Maybe they want to sell it :)

Settings

Tell HN: Adobe took down the PDF 1.7 specification from their site

Keyboard Shortcuts