Settings

Theme

PDF is not portable in the digital world

rz.scale-it.pl

27 points by robert-zaremba 8 years ago · 47 comments

Reader

massar 8 years ago

Rendering a Microsoft Word/Powerpoint document as a PDF is a good thing as then one does not need a Doc/Docx/PPT/pptx viewer anymore while most devices come with a PDF viewer builtin (eg. Chrome :) (and as a bonus it kills the anims if they are there) this while keeping the formatting intact (some minor color changes though depending on one export it).

I tend to keep a whole bunch of things I want to 'read later' in my iBooks collection, just save as PDF and transfer to phone, or if already a PDF just download it directly; zooms great too. I got all kinds of device manuals, but also PADI and other diving reference books; always good to quickly check up on it when in doubt and then to reinforce that information with the knowledge of your dive buddy.

Indeed, for content that does not really need a layout outside of some headers (<h1>) and paragraphs (<p>) HTML is perfectly fine.

Quite a few text portions of conference papers (read: Tex :) can be rendered as markdown and then easily converted to HTML, but it won't feel 'as well', thus PDF is a easier format that also reflects the original intent and format.

IETF RFCs typically can be rendered in a myriad of ways thanks to xml2rfc, then again, one mostly will end up reading them from tools.ietf.org or to keep local, render as PDF and load it into iBooks.

  • discreditable 8 years ago

    On the topic of MS Word, I really wish their export to html function was more simplistic. Most of the time I just want to export the document structure (headings, tables, bold, italic, etc.) and not include all of the styles and extra markup.

  • robert-zarembaOP 8 years ago

    Reading PDF on mobile is possible, but far less comfortable than EPUB / MOBI...

    • noir_lord 8 years ago

      I would have agreed until I got an ipad mini 2 (for safari testing), screen resolution is high enough that it's close to print experience.

      • jrimbault 8 years ago

        I think robert-zaremba is talking about screen size ? Not screen resolution.

        IMO, reflowing text is big advantage if you're going for "portability accross screen sizes".

        • dragonwriter 8 years ago

          > IMO, reflowing text is big advantage if you're going for "portability accross screen sizes".

          It's a big advantage of your content is all pretty simple text; it's less useful for any other content. For complex content, you really need human attention to size-specific layout to getting anything better than just zooming content designed for one size.

        • mercer 8 years ago

          Indeed. I love my iPad Mini and I use it for pdf's quite regularly (GoodReader is great for this), but it's just a tad too small to comfortably read A4-size documents, even on a retina screen. For many eBooks this is fine, but scientific papers are usually A4 and can be hard to read and annotate.

          In some cases this can be fixed by cutting the margins, but aside from a certain unpleasantness I feel reading an almost-marginless document, it's often still not enough to comfortably read.

          So for the most part, whenever possible, I go for ePub documents using Marvin (also an excellent app), and I print the larger documents.

          It's one of the main reasons I'm considering getting a normal-sized iPad though.

          • jrimbault 8 years ago

            I'm currently considering getting an iPad pro 12" for reading piano partitions. I'm sick of the stacks of large heavy old books laying on my piano and muffling the sound.

            Does anyone here have a similar experience for that use case ? An iPad Pro seems large enough, but I don't really know. I've seen the Sony dpt-s2, but it seems very expensive given its limitations.

        • robert-zarembaOP 8 years ago

          yes, screen size.

          Most of the people will agree with me that straining your eyes is not a good solution ;) Little letters are not comfortable neither healthy.

rubidium 8 years ago

PDF is beautiful. Long live PDF.

I read a lot. I don't always want to read from a screen.

Making PDF's available on the internet just saves me from having to search through a journal stack.

PDF's have the advantange of the formatting looks good, and the author/publisher gets to choose how it looks. Usually with input from a professional. This is much better than the "styling" many websites and epubs provide.

  • robert-zarembaOP 8 years ago

    that's the thing. 1) If you use simple HTML / EPUB majority of publications are good for printing. 2) Think about a relief of storing this documents on an ebook reader

    • falcolas 8 years ago

      > majority of publications are good for printing

      This is a problem. PDF supports "all". Epub would have to have equivalent support, as well as native support in most/all vanilla OS distributions to be a real competitor for PDF.

      > Think about a relief of storing this documents on an ebook reader

      For many people this is their phone or tablet. Both of which support PDFs as well (as do many standalone ebook readers). You also don't have to get an extra app to view the PDFs.

      • robert-zarembaOP 8 years ago

        Majority of publications are simple text with few tables or diagrams. We don't need complex typesetting for that. Printing from EPUB should be good.

        You always need an extra app to view PDF. It's either installed by default or not. Phones, tablets, and other screen devices have available EPUB, MOBI, HTML browsers to install.

guidoism 8 years ago

A few weeks ago I would have made the same statement but I started reading the PDF implementation docs and now I really like the format.

The main issue I think we all have with the format is that people make docs that are almost impossible to read on a small screen.

There are ways around this: 1. Tagged PDFs present the underlying content and semantics in order to reflow for accessibility purposes though right now very few people seems to use this feature and 2. Maybe it wouldn't be a bad thing to make PDF pages closer to a paperback book rather than an A4 page with the resulting shorter line length and reduced margins.

PDF is indeed more complex than plain HTML with some cribbed CSS but in many ways it's a lot better: 1. It truly in portable in the sense that every computer will render it in exactly the same way, 2. It packages up all assets in an efficient manner (only the glyphs that are needed are included, not the entire font with all glyphs and position hints like web fonts), 3. The expensive layout computation is done once, on a computer in a galaxy far far away from my battery limited phone, and 4. PDFs are (by convention) free from all of the cacophony of crap like share buttons and navigation chrome and ads and articles-you-may-enjoy fluff.

The format itself is actually not that bad, it's a text format in that it's relatively easy to open up a text editor and bang one out. The only inconveniences are the places where you need to state exactly how long strings are (which your text editor can help with) and the creation of the index at the end (which I've been cheating by just running my hand created PDFs through a PDF lint-like utility.

The reason why most PDFs look crazy when opened up in a text editor is that the streams are almost always compressed. You can uncompress with them "qpdf --stream-data=uncompress in.pdf out.pdf"

  • mohaine 8 years ago

    I think the format itself could be ok, but not in its current format.

    The PDF header can be anywhere in the document. This makes parsing for bad content harder. (Also, you can have a valid pdf that is also a valid zip with the contents being the original file. How do you virus scan this?)

    To many old image formats allowed, with readers only supporting a subset. Tiff alone has a bunch of options that are often broken. What happens when you put a multipage tiff in a pdf? (I think you just see the first page in Reader but some other reader might allow you browse them)

    Lots of features in later versions that are not well supported. (forms, document libraries, scripting?)

    It has been while since I left the document area but while I liked simple PDFs, once you say you support them, you have to support all of them which is almost impossible to do correctly. The later specs just have too many features that are almost unused but add a lot of complexity that really isn't needed in a portable document format. A stripped down/cleaned up version of the spec would be nice.

  • klodolph 8 years ago

    The share buttons and navigation chrome certainly can be put in a PDF, they're just incredibly uncommon. I've even played video games that were distributed as PDF files.

    • mercer 8 years ago

      Pdf games? Do you have an example you could send me, perhaps? I'm really curious.

      • klodolph 8 years ago

        The ones I played were adventure games, basically a bunch of areas (implemented as pages) linked by buttons, with some extra logic thrown in. If you can imagine the kind of scripting capabilities you'd need to run a PowerPoint style presentation from a PDF file, and the kind of scripting you'd need to make sophisticated interactive forms, you're on the write track.

        Unfortunately I don't know what the games I played were called.

psion 8 years ago

I find PDF to be way more portable than most other document formats in terms of saving a document or for printing a document. Saving an HTML page has it's own set of problems, and if I share it I have to make sure to get all the images gathered as well. Word processor documents depend on system fonts, etc., and I cannot be sure that what my document depends on is installed on the other computer. With a PDF, I can be sure to get the necessary elements, be them font, images, etc.

  • robert-zarembaOP 8 years ago

    That's where EPUB / MOBI comes for.

    • logfromblammo 8 years ago

      EPUB is essentially an entire self-contained HTML web site in a ZIP file container.

      Whatever you can do on a website, you can theoretically do in an EPUB. It might not display as expected when rendered by an e-reader or printed, however, which is why most EPUB files stay rather safe and unambitious in their CSS and JS.

      I'm a bit disappointed that web browsers don't generally function as EPUB readers or include a "Save As... EPUB" option, but I can't seem to muster the motivation to write a Firefox add-on to do that. It wouldn't even be that difficult, as I have created EPUBs from filesystem directories using nothing more than 7Zip and a shell script.

      There is a possibility that law firms would pay for a premium version that included a crawler and some form of cryptographic validation that could prove that the EPUB file was created at a certain time, from a certain IP address, and hasn't been altered since then. The idea being that trademark owner's lawyer takes a snapshot of a website selling knockoffs when sending the C&D letter, then another when filing the lawsuit, and the evidence for the complaint is preserved without having to rely on static snapshot images of the rendered website or third-party archive sites.

omgtehlion 8 years ago

Finally, in 2017, when PDF became abundant, does not require additional drivers to use it, has somewhat usable spec, and a lot of 3rd party and open source software to work with do we really need to get rid of it?

Bashing PDF is so 2000...

emeraldd 8 years ago

My first thought on this is that EPUB is not fully portable either ... it's just non-portable in a different set of circumstances. If you want to publish on the internet, for general consumption, just use plain html. That's about as portable as you can get without moving into raw text.

  • ldjb 8 years ago

    Plain HTML isn't so good if you want to include images in your document or have multiple pages. You end up making the user download a whole bunch of files if they want a copy of the document.

    The great thing about EPUB is that all the files are bundled together in a single .epub file. You can copy the file and move it around without worrying about keeping the structure intact.

    I think that perhaps the main issue currently present with EPUB is that EPUB readers aren't really part of the standard installation on devices. Pretty much every PC or smartphone you come across in the wild will have software installed to view PDFs, but the same isn't the case for EPUB. I think Apple have done a good thing by including an EPUB reader (iBooks) as part of macOS and iOS, but that's not the case for other operating systems.

    It might be nice if web browsers could natively act as EPUB viewers, in the same way a number of them natively act as PDF viewers. That way, the user already has an EPUB viewer installed, and they don't have to go and find and install one.

    • mercer 8 years ago

      Now that I think about it I'm kind of surprised that browser don't natively show EPUB files.

      • robert-zarembaOP 8 years ago

        there are plugins for that (same as you have with PDFs, though you reuse same rwndering engine, since EPUBs are html documents under the hood)

        • ldjb 8 years ago

          There are plugins, but having to find and install one is an extra step that I think a lot of people don't take.

          It's far different than being able to link to a PDF file and having it instantly display in the browser. As far as I'm aware, Firefox, Chrome and Safari (and possibly other browsers) can all display PDFs natively, so I think it would be convenient if they could also display EPUBs without the need for additional plugins.

          As you say, EPUBs use HTML under the bonnet, so rendering should be pretty straightforward.

          • robert-zarembaOP 8 years ago

            I hope one day all popular browsers will render EPUB straight away. Should be lot easier then PDF.

    • Spivak 8 years ago

      > Plain HTML isn't so good if you want to include images in your document or have multiple pages.

      Am I missing something? This is practically the only thing that vanilla HTML is good at?

      It seems like you could get all the benefits of EPUB with an archive of HTML files.

      • logfromblammo 8 years ago

        EPUB is an archive of HTML files.

        It has a handful of metadata files to tie all those HTML files together, but it is essentially a ZIP container full of HTML, CSS, JS, and images. Rename filename.epub to filename.zip and open it up some time.

scholia 8 years ago

The point of the story is as follows:

> PDF is not portable on digital screens. It doesn’t scale. It’s not comfortable to read PDF files on a mobile or ebook readers

Arguing that PDFs are good for printing out, or better than some other format, doesn't actually address the issue ;-)

accordionclown 8 years ago

1a. .pdf is a fine format.

1b. unless -- as is increasingly the case these days -- you're reading it on a screen smaller than the one for which the .pdf was designed, in which case .pdf is an awful format.

2a. .epub is a fine format.

2b. unless you're reading it in a viewer-app which is wonky, which most of them are. (the inconsistencies of rendering with this so-called "standard" are unbelievably bad, and seem to be getting worse rather than better as time goes on.)

3a. when you try to re-use text by copying it out of a .pdf, you often get some really bad stuff that loses a lot of important styling.

3b. when you try to re-use text by copying it out of an .epub, it's not much better.

4a. the standard line is that an .epub is just "a website packaged into a .zip file", implying that anything you can do on a website can be done in an .epub.

4b. the standard line is a lie. an .epub requires .xhtml rather than .html, and a complex mess of associated files, and most .epub viewer-apps have trouble supporting the full gamut of .css, and also do not allow you to use javascript at all.

conclusion: the state of sharing documents on the web in a way that allows offline use while enabling the convenient re-use of text is a sad state indeed.

dragonwriter 8 years ago

PDF a perfectly portable in the digital world (the only world in which it has ever existed.)

It's not perfectly optimized for every display (or print page, the two being equivalent) size, resolution, etc., but then neither is any other format that can handle the same range of content, nor will any format ever be until we have AI layout that does as good as professional layout from a single source file for all media sizes and properties.

I find professionally laid out PDFs that are designed for letter/A4 size pages to superior in practical use to any reflowable format I've yet seen at pretty much every size for most content more complex than plain linear text like you'd find in a novel. (Smartphone and smaller devices aren't great for it, but then they aren't great for reading content more complex than linear text regardless of format.)

icebraining 8 years ago

Most devices include a PDF reader, but not an EPUB reader. Yes, you can download one, but as a publisher, you can't expect your readers to jump through that hoop.

  • robert-zarembaOP 8 years ago

    EPUB / MOBI reader is not a problem this days

    • sigzero 8 years ago

      That really doesn't speak to his statement. The majority of devices read PDF by default where EPUB/MOBI, the user needs to go get an app.

      • Spivak 8 years ago

        10 You should publish PDFs because devices include readers.

        20 Devices have readers because everyone publishes PDFs.

        30 GOTO 10

thinkMOAR 8 years ago

Instead of calling for a 'ban' on a format by very much subjective reasons, how about calling for publication in multiple formats, so the people have a choice? It is certainly not much more work, and it looks in my humble opinion, professional.

  • robert-zarembaOP 8 years ago

    Good point! Sorry if my Call sounds repulsive. Your idea with creating publications in multiple format works. I will update my post for that. Though, in my post, I want to highlight that usually simple solutions works fine. Most of this publications are not complex in terms of typesetting. If there is an objective for complex typesetting than fair enough.

unsignedint 8 years ago

I'm understanding hard time understand some of the point this article makes; particularly the claim about that you need to think about typesetting and design more. Most of word processors these day have some type of style system that akin to HTML.

PDF is also one of few formats that is readily available that has a well defined archival spec (PDF/A) which further makes it more compatible across the readers. (As essentially it requires documents follow certain specs.)

robert-zarembaOP 8 years ago

Do you publish on the Internet? Do you read a lot of publications on your digital devices?

How about stopping using PDF for internet publications and using EPUB instead (or some other screen independent format)? Please, share your comments.

  • throwaway2016a 8 years ago

    Trying to be helpful...

    I think you may be getting down voted because it is generally considered bad form to submit your own blog unless it is a "Show HN" but if the content is good it can out weight that and an article can be up voted anyway. But if you do submit your own content it is probably best to let the content speak for itself vs trying to solicit HN as a discussion forum.

    • robert-zarembaOP 8 years ago

      Thanks for a comment. What's the reason for not posting own article to content discovery services / agregators? It's a common thing.

      • throwaway2016a 8 years ago

        HN is not a typical aggregator.

        HN historical is not a marketing site, it's a forum where technologist share things they find interesting. So any form of self promotion is usually viewed with a bit more critical lens than if it was submitted by a third party. Rightly or wrongly the bar is set higher.

        People liked your article (it made it to the front page) but your comment above feel like it is extra "sales" oriented even if the article itself if not about a product.

        If you must comment on your own story you should do it as if you were commenting on another user's post, in first person. For example, you could rewrite your comment as:

        "I wrote this article because I was frustrated with the poor reading experience when content is delivered in PDF. What do you all think?"

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection