Hashbang URIs Aren't as Bad as You Think
mtrpcic.net> Now, take away the constant loading of the JavaScript and CSS, and you've significantly decreased bandwidth and server load
On a related note, if this is a problem for you, now might be a good time to check your Expires headers. The 'some clients turn off caching' seems like a bit of a non sequitur too- some clients also have javascript disabled. Some clients are IE6. How many clients are like that in the wild? How many of /your/ users?
The caching justification felt very contrived to me. I'd guess that the percentage of users with broken cache handling is lower than the percentage of users with javascript disabled. I'd rather not guess, but the author didn't provide any actual evidence that this is a problem.
If your cache headers are set incorrectly, you should probably fix that instead of reworking your entire site.
It's true that the caching argument seemed contrived (and was, in a sense), so I've added an edit to it to take your comments, as well as others, into consideration, and to make them known to other readers. Thanks for the feedback.
I agree, and that's kind of the point I was making towards the end. You need to know what your target audience is, and whether or not those are issues you might face. In either case, if you have a "DOM Heavy" application, you can still save bandwidth by not needing to send anything except the data across.
Yes, but if you want to be honest about numbers, you should say something like
Reddit.com HTML.: 37.6 KB Text in that HTML: 5.9 KB
(That's just what I got from copy+pasting.) The difference at issue is loading another 37.6 KB page versus loading whatever part of that 5.9 KB is the "interesting" text (i.e. probably not the header or footer) plus its links and styles and the overhead of whatever you want to stick it in (JSON, an HTML fragment, etc). The huge CSS and JS files are not reloaded every time you go to a new page.
That's a good point. The big issue is loading the "Heavy" part of the DOM every time (navigation links, header/footer, user panel, etc), rather than the real "content". I'll make an edit and make that a more prominent point. Thanks for the feedback. If you don't mind my asking, how did you pull the 5.9KB number out of the 37.6 one? Just checked the size of the interesting part of the page once saved into a new file?
> If I have JavaScript enabled, and send my friend (who has
> JavaScript disabled) a "Hashbang" URI, he won't load the
> content!
> You're right. This is the primary drawback of an
> approach like this, and you'll need to figure out if
> that's acceptable for you.
Simple answer: "no".
It's not acceptable for me in any way. Not as a user, who disables JS by default (for speed, security and annoyance avoidance), and not as a site own either.So, these URLs are still as bad as I think.
Do you know the old joke about Java where people can't by hammers anymore, but need to buy a HammerFactory to produce hammers for them (and then HammerFactoryFactory etc.). Requesting a page that then requests the right content from the server just feels like an unnecessary Factory step.
That's why this approach isn't for everyone. To each his own. One thing to keep in mind, however, is that the approach is meant more for projects that can be thought of as "web applications", rather than websites; things that would require JavaScript to be enabled to even work.
Also, another way of thinking of it is to stop thinking of the browser as a browser. If you do the hashbang thing properly, you're now treating your browser as an API consumption service, and treating it as you would any other application. This isn't necessarily a bad thing, but does have drawbacks.
The problem is that - as Gawker have shown - people start thinking that techniques that might be acceptable for web apps are acceptable for content sites.
They don't think this through as they don't use anything other than a modern web browser with javascript switched on.
I'm not a fan of the hash-bang URLs, but I think this article largely misses the objections I (and many others) have to them. Nothing that he said is unique to hash-bang URLs; the same thing could be done with typical fragment identifiers, and indeed has been for ages. My objection to hash-bang urls in particular has nothing to do with using fragments to dynamically load content, which I think is a great idea. If you write your code like the article suggests, you'd be fine by me: he's taking a page which can be accessed by a non-js enabled browser and using js to modify the links so they load content dynamically. I can still look at that page with, say, wget or lynx and nothing would break. The Gawker redesign—and the whole hashbang scheme—requires any agent that doesn't execute js to mangle the URL in order to get at the static content. And that's the issue here—forcing non-compliant agents to change in order to do what they've always done runs in the face of the Robustness Principle. Hash-bang URLs are an non-elegant solution to a minor problem; use normal fragment identifiers for your Ajax instead.
"And that's the issue here—forcing non-compliant agents to change in order to do what they've always done runs in the face of the Robustness Principle."
You nailed it. Hashbang URLs are not backwards compatible with the existing World Wide Web. And existing tools that use the Web today, cannot in their current state use these hashbang frameworked sites in the same way.
As another commenter noted, hashbangs silo a site within a non-standard requirement. That approach explains why posts such as these overlook or ignore fundamental behaviour of the Web and how it brings together fragments of distributed conversation.
Hashbang URLs break the World Wide Web stack at the HTTP and URL levels, and attempt to fix the damage at the behaviour level. The fix is inferior, and results in breakage of the World Wide Web model (like framesets).
It's a technical implementation of a walled-garden. Walled off from the exisiting World Wide Web.
Large/lots of javascript files can slow down the loading of the page, but that wasn't the real issue people were taking with JS-controlled sites (as I interpreted things).
The real delay cost was from the inevitable redirect to the root of the website. When you visit domain.com/user-name, a JS-controlled site will usually redirect you to domain.com#!/user-name, and then loading up the page. There is no avoiding this unless you want really ugly urls: domain.com/user-name#!/user-name...
So large JS files might increase the load speed slightly, but adding an entire redirect is the kicker.
Like most things, there's a time and a place. Using hashbang urls can increase response-time when navigating through a site and provide a really cool experience. On the other hand, it definitely doesn't work for all browsers and users.
Such a redirect isn't slow? It's near instant if done right. Here's twitter doing it: http://twitter.com/bjorntipling
It's absolutely super fast for me. How is that a 'kicker'?
Large JS files only need to be downloaded once and are cached.
There's no problem here.
> Such a redirect isn't slow? It's near instant if done right. Here's twitter doing it: http://twitter.com/bjorntipling
On slower devices (such as mobile phones) redirecting can be more noticeable. You're right though, in general it's not really a problem, but might be more so than a larger-than-normal JS file.
> Large JS files only need to be downloaded once and are cached.
Yeah, I was trying to say that. Larger JS files aren't a problem at all; load them once and they're cached.
Shebang. It's called a shebang.
"a shebang (also called a hashbang)" -- http://en.wikipedia.org/wiki/Shebang_%28Unix%29
Hear hear.
I can think of many cases where ajax loaded content and the attendant hashbang URLs are far preferable, like with twitter's web interface or with gmail. It just makes sense for web applications to work differently than static content, and persistent display of information often beats out clean looking URIs.
Plus the whole anti-hashbang thing has a reactionary air to it.
"Web app" is a misnomer. If the content isn't browsable hypertext, it has abandoned the Web and stepped backwards into the ghetto of siloed client/server apps that were deservedly hated in the 90s. And the industry has yet to deliver a trustworthy js sandbox that can safely run any code it happens to find anywhere—the majority uses the defaults because they don't know how reckless those defaults are.
Thanks for this response, it's really thought provoking. I have a few questions:
'The ghetto of siloed client/server apps'? Would those be like ActiveX controls and Java applets? Isn't JavaScript fundamentally different?
Does the definition of 'browsable hypertext' preclude hypertext that's scripted to operate differently, e.g. 'ajax'? Are you not still 'browsing hypertext'?
The industry has yet to deliver a trustworthy js sandbox—should browsers not support JavaScript?
By siloed I was referring to all the VB-style apps that predated widespread use of the Web. You had to use a single mediocre client app because it was the only piece of code in existence that could support the proprietary protocol for the matching server. Lock-in was rampant and building a better client or repurposing the data in any way was almost impossible.
Now we have servers that may technically still be talking XML or JSON or something over HTTP, but it might as well be an opaque proprietary protocol, because there's only one piece of code in existence (the javascript embedded in some page) that knows how to send meaningful requests to the server or decode its responses. The protocol isn't even stable enough to reverse-engineer because the author can make arbitrary changes to it and migrate everyone to an updated version of their client code at any moment. I find this vastly inferior to query strings, multipart/form-data, and scrapable semantic HTML, which a growing number of web devs completely neglect (none of whom I'd ever hire).
> should browsers not support JavaScript?
They shouldn't run it by default without asking whether the user trusts the author. Privacy violations are rampant and even malicious scripts have become a recurring problem. I don't see why a sandbox that works shouldn't be possible, but it hasn't happened yet.
That's the thing. When you're making something on the web, you need to define to yourself whether it's a Web Application or a Web Site. The hashbang approach lends itself very well to the Application side of the browser.
The difference between a web site and a web application isn't binary. Modern web sites increasingly blur the lines between the two.
Hashbang URLs are bad because they damage the addressability of the Web. If I can't take a URL, telnet to port 80, run GET /path and get back a representation of the thing that the URL points to, that URL isn't really part of the Web - it's part of a JavaScript application delivery engine that happens to run over HTTP.
Here's the thing: I don't hate hashbang URLs, I hate [site]'s use of hashbangs URLs. One example of [site] that stands out in particular is lifehacker. Why in the world does a blog need that? There isn't a good reason.
I don't think anybody opposes it in cases where it actually makes sense, but I assert that the dividing line is pretty clear.
Do you mean the dividing line between web sites and web applications?
How would you categorise Flickr, or YouTube, or Wikipedia, or Yelp, or Lanyrd, or Craigslist - or pretty much any other UGC site? I'd argue that all of them could be described as both.
The distinction between web sites and web applications is not granular enough to make this decision. What you should be considering as a developer/designer of a web site/app is not about what bucket Flickr or YouTube or Yelp falls into, but what specific screens should be accessible with shebangs and which screens should not.
Flickr's photo indexes and Yelp's listings and reviews should probably be accessible via regular old URLs. But Flickr's uploader screen? Adding a new listing to Yelp? If you're ok with leaving behind users who have javascript turned off, it simply _does_ _not_ matter. In those cases, why not take advantage of the snappy UX that shebanged interfaces offer?
That's true, to a point. My principle objections to hashbang URLs is greatly reduced for pages which people are never intended to link to - private "edit" interfaces protected by a login are a prime example.
Personally I'm not OK leaving behind non-JS users in my own development, but provided a site's public pages are accessible I'm not too bothered what they're doing when people log in.
All of those are examples of websites that would or do suffer from excessive use of javascript.
I agree 100%.
If you want to curl a hashbang URL (or access its content in some other non-Javascript-requiring way), there's a pretty easy way to do it. Just do the same thing Google does and remove the hashbang. It's like getting mad because you can't parse the XML you find at a JSON endpoint.
"If you want to curl a hashbang URL (or access its content in some other non-Javascript-requiring way), there's a pretty easy way to do it. Just do the same thing Google does and remove the hashbang."
Which command line option on curl does that? (Maybe they haven't updated the man page? http://curl.haxx.se/docs/manpage.html ).
Oh, if you reply, could you do it in the form of interpretative dance.
sed 's/#!\///g' urls.txt | xargs curl
Yeah, not that hard dude.
> sed 's/#!\///g' urls.txt | xargs curl
So easy, you've actually got it wrong.
Certain special characters after the hash-bang need to be url-encoded, and then that value needs to be added to what's before the #! by including a query string parameter of _escaped_fragment_, checking first whether there is already a query string so as to append the information rather than incorrectly whacking on a '?'.
Plus, this isn't in the form of interpretive dance. So no content for you.
Ok fine, so it will take 10 minutes to whip up and test a quick Ruby script instead of 30 seconds to think of the regular expression. I stand corrected.
Now multiply that 10 minutes by number of scripts, utilities, libraries and applications in the world that handle URLs, and you'll be somewhere close to the magnitude of effort required to work around these broken URLs.
I wonder if anyone has tried this: suppose that to make page transitions faster, you strip out everything from each page other than the data itself? The page should just contain the main content (which would have to be fetched via AJAX anyway), and JavaScript can fill in things like sidebars and navigation.
IIRC, Posterous does this as part of their caching strategy. Static content is HTTP cached and dynamic content is pulled in via AJAX on page load. The result is that the primary content is available fast and dynamic secondary information (such as view count, etc) pops in soon after.
Jolly good. A site that understands the concept of 'progressive enhancement'.
A better approach in all ways is to use the HTML5 history API, a la GitHub.
The article makes mention that the example doesn't use the HTML5 History API, and that the onus is on the developer to check for and use that as appropriate.
In all ways except being supported by the browsers used by the majority of the users.
I only wonder why websites who use it themselves, break other shebang urls when you post them - try sharing a twitter url on facebook (always have to strip the #!/ manually).
I'm an engineer at Facebook.
We started using Google's ajax crawling spec. Whenever we see:
we actually crawlhttp://twitter.com/#!/ptarjan
Let me know if you notice any issues.http://twitter.com/?_escaped_fragment_=/ptarjanOh, now it works and links fine! Tx for the update
//edit: btw, do you work with the new messages? I got an annoying bug there - as you type, often the cursor jumps to the end. annoying as hell when you edit sth in the middle of the sentence.
Much of the speed increase (assuming fast and properly cached server) can come from the decreased need to repaint the entire DOM. For my site, this is a big deal.