Settings

Theme

Show HN: ThePDFApi, a Chrome Based PDF Generation API Hosted on AWS

41 points by marcTPA 8 years ago · 36 comments · 1 min read


Hello HN!

It kept surprising me how much of a hassle it is to generate a PDF with decent HTML5 rendering from my SaaS apps. I tried several free libs and APIs but ended up with botched rendering a lot of times. So I set out to simplify this chore by creating an AWS hosted HTML to PDF conversion API that's based on Chrome. This API will allow a dev to just send the HTML to our API and get a PDF in response without having to worry about running and managing Chrome somewhere in their infra.

I just finished the first version of my landing page and hosted pdf generation API (https://thePdfApi.com) and would love to have some feedback on the following points:

- is it clear upon viewing the landing page what the product is about?

- are there any questions that you have that are not answered on the page? I'm thinking about adding an FAQ once the first questions pop up.

- would this product provide value to you if your startup needed to generate PDFs? If not, what would you use instead?

Thanks for helping a fellow hacker out.

https://thePdfApi.com

joewils 8 years ago

Generating PDF's from HTML is a huge PITA. I really like what you've done.

re: "is it clear upon viewing the landing page what the product is about?" Yes. I think your landing page copy reads well, but could use a bit of polish. Emphasis how well PdfApi solves common margin and background rendering issues compared to other alternatives.

re: "are there any questions that you have that are not answered on the page" I'm not sure you need to answer all of a users questions, but considering you are targeting startups and developers, I'd focus on building out your API documentation.

re: "would this product provide value to you if your startup needed to generate PDFs" Potentially, but considering your audience (developers/startups) I suspect most of us would tackle the issue locally with a Chrome headless setup.

re: "what would you use instead" I've used the following to generate PDF's from HTML: * Chrome Headless * WKHTMLtoPDF

This is good work. I like what you've done. Build out strong API documentation with multiple code snippets and examples to improve your developer marketing.

ertand 8 years ago

I don't know much about the complexities of this task so excuse my question if it's too obvious.

How is it different/better than using puppeteer? If it's better, maybe SxS comparisons of generated pdfs could be a good selling point.

  • marcTPAOP 8 years ago

    Thanks for leaving a comment, really appreciate it. The idea of creating a comparison between PDFs rendered with different solutions is genius. Definitely gonna add this.

    Puppeteer would indeed come close in rendering quality. Improvements of using my solution over puppeteer are:

    1) I tweaked Chrome headless to have the fonts available to ensure that typography renders as it should. Even emojis work!

    2) you don't need to worry about installing and maintaining puppeteer and Chrome headless into your own infra

    3) I didn't really make this very clear on my landing page so far, but I'll provide support to clients that have issues getting a certain document to render exactly like they want.

    4) Not really a benefit yet since I wanted to launch with the MVP but soon I'll offer several options in the API that puppeteer itself doesn't offer such as multi-document PDFs, automated clickable terms of content for longer documents, etc.

andreareina 8 years ago

A warning from my own experience: do not use web technologies for any printing (including rendering to pdf) where positioning is critical (e.g. for filling out pre-printed forms). Fiddling with `@media print {…}` and `position: absolute` will work… until there's some minor change in the rendering engine that will throw all that careful work into disarray and leave you asking questions like, "why is this right-aligned bit of text in an 8.5-inch wide container being printed down the middle of my page?" (the preview looked great btw). Oh, and the vertical scale was only slightly short, so I couldn't just scale the page. The right answer in this case was a package that actually spoke pdf and would flow text into a fixed-size box at a fixed location. Oh, and once you've got the file, don't let the browser print it either -- somehow both Firefox and Chrome wouldn't render it to the printer correctly, and of course they would mess up in different ways.

Browsers are good at laying things out on the screen. On paper, not so much.

  • seanwilson 8 years ago

    Could you go into more detail about why printing a web page doesn't do what you want? You mean for example different versions of Chrome would screw up your previously working layout? How about between the same browser on different operating systems?

    I've looked into JS libraries that will directly generate PDFs you can print but each library seems to come with a lot of caveats.

    • andreareina 8 years ago

      Yes, different versions would produce different layouts. Element positions and bounding boxes would change, more so in the horizontal dimension than the vertical. The text itself would be rendered at the proper size though. FWIW Chrome was better-behaved in this regard than Firefox, but even a 10% shift is too much when you need to get text into a specific box that's already been printed.

      I didn't bother to check whether the same browser version on different operating systems would produce the same results.

  • chatmasta 8 years ago

    What makes this so hard? Why can’t browsers just treat the page size as a viewport? Is the problem dealing with page breaks?

    • andreareina 8 years ago

      It's simply not part of the problem that browsers exist to solve, so probably not worth spending the engineer-hours to get exactly right. Printing in general is, but they just need to get things onto the page in some approximation of what the page looks like; millimeter-perfect accuracy isn't needed.

citrablue 8 years ago

Have you considered offering this as a browser extension? It would greatly increase your market size, and I have worked with (non-technical) people who would love to be able to activate an extension, enter an email address, and email out a PDF (of an invoice) to a colleague.

This workflow is a common one, and really frustrating: "Print -> Save as PDF -> choose location on disk/google drive/dropbox-> Save -> switch to email -> compose email -> enter email address -> enter subject -> add attachment -> navigate to saved location (if I can remember it) -> Send".

You could even add a premium feature that would hit a URL on a schedule, to automate report sending to managers (e.g. of Yahoo Ads or any other platform with similarly terrible reporting).

My manager and I at my old place of work used to spend 1-3 hours/month, times however many people had access to his credit card for their subscriptions.

  • marcTPAOP 8 years ago

    Interesting idea, I've been planning to build a save as PDF extension around it as a case study.

    Combining the PDF functionality together with an email function sure is interesting, gonna think this over a bit more. Thanks!

smhg 8 years ago

Simple layout and fast loading time: big plus. I think the copy could be better. Put more focus on the strengths. Don't use the link-blue color if it isn't a link.

About the PDF results: I get mobile versions of websites a lot, but I guess in normal use cases you won't even request those.

  • marcTPAOP 8 years ago

    Thanks a lot for the constructive feedback. Noting them in down my priority list.

    You're right, the most common use case would be that a client sends HTML instead of an URL to the endpoint. This way a PDF can be created of data that's not publicly exposed on the internet (think invoices, contracts, etc.)

    I also don't store any of the data you send to the API, as to not further contribute to your GDPR nightmares.

marcTPAOP 8 years ago

A little addendum, I'm also contemplating to create a manual service where you could send me a document in any format (JPG, PSD, Microsoft Word) and I'd create a REST endpoint that you can call with the data that you want to have inserted into the document. The output of this rest endpoint would be the binary PDF data.

This way your dev team would not have to invest any time at all in the creation of PDF documents. Shoot me a mail at the email address in my profile if you want to know more.

  • vageli 8 years ago

    How would you insert data at arbitrary places in the document? Or is this more like a form-filling API or something else?

jexah 8 years ago

Since there is no contact form on the site, I figure I'll ask here. How well does it manage different margins on each page? ex Cover page with no margins, but rest of document with margins. Does it support Table of Contents and page numbers? What about other features like JS running in the page header/footer components?

Edit: Holy hell, was just reading some more comments mentioning the price and then had a second look. $79 bucks a month for 10 PDFs. Yeah I think I'm gonna go with spending 10 minutes writing a WebAPI to access my PDF generation API. For reference, it took me two days to land on Puppeteer, a day to configure it how I wanted, and costs me $5 a month to do about 50 PDFs per day (not upper limit, that's just how many we need in a typical business day, I don't know what the box is capable of).

eschutte2 8 years ago

I like it! I'm about to release a related service soon so I'm interested in this space.

It's very fast! Are you caching at all?

Since people probably want to use this with private data, would they usually be sending HTML strings to you, vs URLs?

Why's the "i" in API lower case?

  • marcTPAOP 8 years ago

    Thanks a lot for the feedback.

    I'm not caching at all since I do not want to store any potential confidential data on my servers. The main reason that it's fast is that there are several instances running Chrome headless behind a load balancer.

    The main use case would indeed be to send HTML to the API instead of an URL. I just didn't add this use case to the landing page API tester, but it's definitely supported.

    The i in APi is lowercased because I thought it looked cute :)

schappim 8 years ago

I literally just built this for rendering invoices etc yesterday.

I ended up using wktohtml on AWS Lambda. Wktohtml isn’t nearly as nice as Princexml or headless Chrome, but for the most part it gets the job done.

What did you end up using for your stack?

  • marcTPAOP 8 years ago

    That's a coincidence :) I used wkhtmltopdf in the past as well but it struggled when dealing with more complex layouts.

    My stack is a tweaked Chrome headless on Linux in a docker container, exposed by a Node API.

starptech 8 years ago

Generating PDF's from HTML and Web Technologies is nothing new. There are tons of PHP and Node.js libs e.g I could host a service in minutes with the help of https://github.com/GoogleChrome/puppeteer in only few lines of code. Hosting that service on digital-ocean would reduce the cost to ~5/month. Cheers!

  • marcTPAOP 8 years ago

    Thanks for your feedback.

    You are right that generating PDFs from HTML and Web Technologies is nothing new. Most existing PHP and node libs sadly don't provide great rendering once you have a document that consists of modern CSS and HTML or modern image formats such as SVG.

    Puppeteer would indeed provide the same rendering quality but then you're responsible for the maintenance of the running instances. I'm hoping to make my user's life easier by taking this task out of their hands.

taneltahepold 8 years ago

As this is a developer-oriented product then I would like to see API documentation. Also, in my opinion, the pricing is a bit hidden at the moment.

I have a similar API product for generating PDFs, we saw that the generation part is easy, it gets crazy when your customers start asking the customizations for their labels, invoices, packing slips, contacts etc.

koliber 8 years ago

Ha! I recently launched an analogous service for converting CSVs into Excel files so that web sites can offer users rich spreadsheet downloads instead of crummy CSV files -- goodgrids.com. I posted it to Show HN and just found this! This is great.

  • marcTPAOP 8 years ago

    Just checked your landing page, great job at making the advantages of your service crystal clear. And of course congrats on launching, always scary to show your baby to the world.

richjdsmith 8 years ago

Looks good! Pricing is a bit steeper than I'd expected, but I've also never tried dealing with the pains of trying to generate PDFs. What stack is it built on?

  • marcTPAOP 8 years ago

    Thank you. Pricing is still experimental, I'm hoping that the easy API interface and the lack of maintenance that my clients would need to do provide a lot more value than the monthly cost.

    The stack is Linux on Docker, running a tweaked version of headless Chrome with an API created in Node.

kierenj 8 years ago

This is probably a general headless-Chrome question, but - with something like this, how would you go about specifying margins, page breaks, etc?

  • marcTPAOP 8 years ago

    For the margins there is a global setting in the API that will set the specified margins on the entire document. You can specify additional margins on top of this by setting a margin in css.

    Page breaks are controlled by the CSS properties page-break-after, page-break-before and page-break-inside.

ernststavrob 8 years ago

How about not allowing file:// URLs?

stevekemp 8 years ago

Do you have a security contact address?

Edit : Emailed your contact@brainhashed address.

  • stevekemp 8 years ago

    Now that it has been fixed I'll say the site previous allowed you to enter URLs of the form `file:///etc/passwd`, which were then rendered in the PDF output.

    In short arbitrary local file inclusion.

swyx 8 years ago

yes, it looks very nice. i dont have a need to generate pdf's unfortunately so i'm not your target market. good luck. great to productize your side effects.

teddyqwerty 8 years ago

I tried generate PDF on the landing page and get a 422 error.

  • marcTPAOP 8 years ago

    What URL did you try to generate a PDF from? The PDF APi replies with a 422 when it can not connect to the requested URL.

    This could have a few reasons, the most common one would be if the site you want to generate a PDF for blocks connections from an AWS ip-range.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection