Settings

Theme

Pre-rendering static websites with the 23 year-old wget command

apex.sh

34 points by tjholowaychuk 6 years ago · 15 comments

Reader

jake-low 6 years ago

Thank you for sharing this; I find the technique refreshingly simple.

> You may have seen people achieve this with a more complex headless Chrome-based solution, but for many sites, this will be perfectly fine!

Can you elaborate on the difference between using wget and a heavier solution? I assume the main difference is that a headless browser can execute JavaScript and then serialize the resulting DOM back to HTML, allowing you to build sites in client side frameworks (React, Vue) and then make static versions of them for deployment. Are there other benefits of using a full browser vs. simply using wget?

  • zeroimpl 6 years ago

    I think this article unnecessarily conflates the pre-rendering techniques used for JS-heavy websites with what I would call a static website build/compile process.

    Using wget to compile your website is a clever idea. But it won't work if your website uses JS to generate links, so I wouldn't call it pre-rendering (since there better not be any post-rendering)

    • tjholowaychukOP 6 years ago

      Yeah it doesn't work if you're relying on JS for interactions and layout, but wget's crawling technique works great if you're happy with using server-side rendering for content.

pedrocx486 6 years ago

The title makes it look like wget is obsolete. Why not use the original title?

  • Gys 6 years ago

    The HN post is by the author himself and in the first paragraph the author also mentions '23 year-old wget'.

znpy 6 years ago

Wouldn’t it make more sense to generate the html and save it to the appropriate file from the blog generator itself?

What if you have a page that is there but it’s not linked from any other page (a landing page for example)? It would never be pre-rendered.

  • tjholowaychukOP 6 years ago

    You can definitely do that, but I find this appealing for development, just write some routes as you normally would and the templates all re-render etc, no need for watching file changes and re-compiling. But you're right, if you have a 404 template for example you have to `curl ... > build/404.html` which is a bit lame.

  • thedanbob 6 years ago

    I did exactly this with a rails project once, called render_to_string on each page I was interested in and saved them as HTML files. The wget method is clever but I agree, working within the same system makes the most sense.

    • zeroimpl 6 years ago

      If you are using a web framework designed for dynamic processing - think something like Java servlets/JSP, or in the authors case Go - it's often non-trivial to find/implement a render_to_page function, let alone enumerate all possible pages.

app4soft 6 years ago

It would be interesting to combine wget with HTMLDOC[0] for convert static websites to PDF book.

[0] https://github.com/michaelrsweet/htmldoc

combatentropy 6 years ago

Would it be less maintenance to use your web server's cache feature? Both Apache and Nginx can cache dynamic pages to static files.

  • tjholowaychukOP 6 years ago

    These days I think most people deploy static sites to a CDN, they get you such great performance I can't imagine not using a CDN, my site loads in 10ms in London for example.

deedubaya 6 years ago

Ha, I remember this being a thing to pre-warm caches in java systems a zillion years ago. What was once old is new again.

enriquto 6 years ago

wget is not a command, it is a program

  • ozzmotik 6 years ago

    i would argue that it's both. what it isn't, is an internal command built into the shell. but what it is is a series of characters that identifies a binary, and the act of submitting those characters is a metaphorical command to the shell to invoke said binary. but maybe im just being pedantic ¯\_(ツ)_/¯

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection