Settings

Theme

Ask HN: What is the best way for validating files?

2 points by deca6cda37d0 6 years ago · 4 comments · 1 min read

Reader

Users can upload files into my backend system. Other users can view those files in the client.

For example I only allow PDF files. What is the best way to validate that the file uploaded is indeed a PDF file. And not a otherfileformat.pdf? So the files uploaded will be rendered correctly. This is to prevent human errors.

necovek 6 years ago

You'd have to define "best":

* how much do you care about performance?

* how much do you care about safety?

The simplest and fastest would be to check for the "PDF" signature at the start of the file. Refer to the open PDF spec to ensure you are allowing anything that's acceptable (eg. do you care about FDF files?).

If you need to protect against malicious attempts, rather than user errors, it gets much harder quickly (and theoretically impossible, since you can construct files which will be both valid PDFs and something else).

To give another example, if you are aiming to protect yourself from being used as a media-sharing service, PDF allows embedding media as well, so allowing PDFs will not stop that — they are container formats as much as anything else.

The safest would be to reprocess and re-render only the subset you allow: but that's most expensive in terms of implementation and CPU time, and also somewhat limiting — you can't keep digital signatures, for instance.

  • deca6cda37d0OP 6 years ago

    Thanks for your answer.

    It is to protect against user errors. Your suggestion to check for a signature sounds what I'm looking for.

    • necovek 6 years ago

      Sorry, I missed this earlier. PDF spec turned into an ISO standard with 1.7, and became unavailable without paying since 2.0, but 1.7 at Adobe's site is pretty clear about the signatures (nice, simple section on headers :).

      (My phone decided not to let me paste the URL, but it's a quick search away — do not be afraid of the spec, it's quite simple, esp the parts you care about)

    • krapp 6 years ago

      Here is a list of file signatures[0] if you're looking for one.

      [0]https://www.garykessler.net/library/file_sigs.html

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection