pdf.js: Rendering PDF with HTML5 and JavaScript

5 min read Original article ↗

Update: I updated the links again. pdf.js has moved to a new location on github.

Why?

While traveling to the Firefox 4 launch parties in Seoul and Taipei all the way from California, we killed a lot of time by brainstorming cool things to do with the web platform. Like many before us, we were wondering why nobody had implemented a PDF reader in HTML5/JavaScript. The kinds of operations a PDF reader needs to be fast at –render text, draw lines, blit images– need to be fast in browsers too, so browsers are already highly optimized for them.

Building an HTML5-based PDF renderer would also answer the question of whether the web platform and in particular canvas and SVG APIs are complete enough to efficiently and faithfully render PDFs.

Displaying PDFs directly in the browser would definitely improve the user’s experience. There are literally millions (billions?) of PDFs floating around the web, and on many devices loading PDFs switches to a different application (e.g. Preview on OS X and PDF View on Android). Also, external PDF readers and many plugins don’t support important PDF features well, including content links and fetch-as-you-go (HTTP range requests).

External readers and plugins are also forced to reinvent their own user interaction paradigms, meaning for example that users might scroll HTML pages in one way with one set of heuristics in the browser, but a totally different way in an external PDF reader.

It’s important to note that we’re not trying to promote PDF to a first-class web citizen like HTML5 is. Instead we hope that a browser-native PDF renderer written on the web platform allows web technologies to subsume PDF.

Benefits

The traditional approach to rendering PDFs in a browser is to use a native-code plugin, either Adobe’s own PDF Reader or other commercial renderers, or some open source alternative (e.g. poppler). From a security perspective, this enlarges the trusted code base, and because of that Google’s Chrome browser goes through quite some pain to sandbox the PDF renderer to avoid code injection attacks. An HTML5-based implementation is completely immune to this class of problems.

Project Status

We have been developing pdf.js in the open (on github.com), albeit quietly, for about a month now. We were waiting on the completion of some major features (Type1 fonts, gradients, etc.) before communicating pdf.js more broadly. We’ve been taken by surprise by the early and intense interest in our work, so we decided to blog and talk about our project earlier than we initially planned.

As part of our project plan, we are initially focused on achieveing pixel-perfect rendering of a single PDF paper, a 2009 paper on Trace Compilation we submitted to the ACM SIGPLAN PLDI conference. As the Tracemonkey work described in the paper led the way for JavaScript JITs, so we hope pdf.js opens the door to implementing legacy formats on top of the web platform.

If you want to see a demo of pdf.js, click on this link. There are still glitches and rendering artifacts, but you will get the picture. We are still missing Type1 PostScript fonts, which Vivien Nicolas is working on.

Along the way, we had to add some new interfaces to the HTML5 canvas element, and figure out how to implement some difficult features of the PDF spec in JavaScript. See Chris’s post for a general technological overview, and Shaon’s post for details on rendering “shading patterns”.

Whats next?

We intend to use pdf.js to render PDFs “natively”, within Firefox itself. Our most immediate goal is to implement the most commonly used PDF features so we can render a large majority of the PDFs found on the web. We believe we can reach that point in less than 3 months (the entire code so far is less than one month old, and it already renders a large set of PDF features).

Initially we will make a Firefox extension available to interested users that enables inline PDF rendering using pdf.js, but our ultimate goal is of course shipping pdf.js with Firefox. This will result in a substantial usability but also security improvement for our users. pdf.js uses only safe web languages and doesn’t contain any native code pieces attackers could exploit.

Open Source

We want pdf.js to be a community driven and governed open-source project. We’ll use it for Firefox, but we think there are many cool applications for it. We would love to see it embedded in other browsers or web applications; because it’s written only in standards-compliant web technologies, the code will run in any compliant browser. We are licensing pdf.js under a very liberal 3-clause BSD license and we welcome external contributors. We are looking forward to your ideas or code to make pdf.js better! Take a look at our github and our wiki, or talk to us on IRC in #pdfjs.

Chris Jones and Andreas Gal (and the pdf.js team)