Settings

Theme

System loads web pages 34 percent faster by fetching files more effectively

news.mit.edu

177 points by qwename 9 years ago · 104 comments

Reader

tyingq 9 years ago

Interesting. The paper was released before HTTP/2 was in widespread use. They do show that their approach has significant improvements over SPDY alone...I wonder how the comparison to HTTP/2 alone would fare.

  • vitus 9 years ago

    Personally, I'm more curious to see the comparison with QUIC, which eliminates the RTTs to set up TCP + TLS, and multiplexes connections in a way that doesn't lead to the head-of-line blocking problem that you can see with HTTP/2 over TCP -- QUIC uses UDP, so no requirement for in-order delivery across the entire connection, just within each resource flow (in this context, each request).

    On that note, I do think it's a bit inaccurate to say that Google's efforts are/were primarily focused on data compression -- yes, they did introduce brotli, which is just a better LZ77 implementation (primary difference with gzip is that window size isn't a fixed 32KB), but they also pioneered SPDY (which turned into HTTP/2 after going through the standards committee) and now QUIC.

    (obligatory disclaimer that google gives me money, so I am biased)

    • tyingq 9 years ago

      Wouldn't it be difficult to do a straight comparison without other variables, given that there isn't a popular webserver that supports the protocols they tested with, as well as QUIC?

      Edit: Should note that I'm excited about QUIC as well, just thought I may have been missing some new development where it's (already) more usable outside Google's walls.

      • vitus 9 years ago

        It is, but that doesn't change the fact that I'm still more curious on that front. The wiki page has a few sample servers [0], but again, those don't meet your criterion of popular.

        I realize that the initial RTT reduction isn't so relevant in the context of large dependency graphs, but especially in the context of packet loss, the proper multiplexing across flows seems useful.

        There are also a number of other QUIC-based features that might be relevant, since it does essentially reimplement the relevant parts of the TCP stack, so parameter choices might be better suited to today's internet (as opposed to 1980's).

        [0] https://en.wikipedia.org/wiki/QUIC#Server_support

  • breischl 9 years ago

    Aren't they orthogonal? No matter how fast HTTP/2 is or how much it decreases connection setup times, requesting resources in the "right" order will always be faster than doing it in one of the "wrong" orders.

    More efficient protocols might reduce the disparity, but there should always be one. Right?

    • tyingq 9 years ago

      Yes, it would always be faster, assuming relatively heavy pages. But the reduced number of requests, hpack compression, and server push features in HTTP/2 might reduce the performance improvement to something less impressive.

      Given that Polaris has some downsides, it might reduce it to the point that you wouldn't consider it. In it's current state, Polaris requires a lot of pre-work...including dependency analysis via a real browser. It also serves up pages that wouldn't work with javascript disabled, and might not be terribly search engine friendly.

    • bastawhiz 9 years ago

      Server push in HTTP/2 can force files to load in the right order.

  • k_lander 9 years ago

    is there significant difference in performance between HTTP/2 and SPDY?

qwenameOP 9 years ago

Relevant paper: "Polaris: Faster Page Loads Using Fine-grained Dependency Tracking", http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf

leeoniya 9 years ago

but we can already make web pages load 500% faster by not shoveling a ton of shit, not loading scripts from 60 third-party domains (yes stop using CDNs for jQuery/js libs, those https connections aren't free - they're much more expensive than just serving the same script from your existing connection), reducing total requests to < 10, not serving 700kb hero images, 1.22MB embedded youtube players [1], 500kb of other js bloat, 200kb of webfonts, 150kb of boostrap css :/

the internet is faster than ever, browsers/javascript is faster than ever, cross-browser compat is better than ever, computers & servers are faster than ever, yet websites are slower than ever. i literally cannot consume the internet without uMatrix & uBlock Origin. and even with these i have to often give up my privacy by selectively allowing a bunch of required shit from third-party CDNs.

no website/SPA should take > 2s on a fast connection, (or > 4s on 3g) to be fully loaded. it's downright embarrassing. we can and must do better. we have everything we need today.

[1] https://s.ytimg.com/yts/jsbin/player-en_US-vfljAVcXG/base.js

  • chrisfosterelli 9 years ago

    > stop using CDNs for jQuery/js libs, those https connections aren't free - they're much more expensive than just serving the same script from your existing connection

    Do you have a source for this? My understanding is that, in real usage, it is cheaper to load common libraries from a CDN because in a public CDN (for something like jQuery), the library is likely to already be cached from another website and has a chance to even already have an SSL connection to the CDN.

    Obviously 60 separate CDNs is excessive, but I don't know if the practice altogether is a bad idea.

    • bartread 9 years ago

      Using a CDN for common libraries may help, but doesn't always, and is something you should measure rather than just assume. The situation where a CDN actually hurts performance is one I've seen periodically at different clients.

      When people talk about serving jQuery, or J. Random JavaScript library, from a CDN it means the specific version of jQuery (or whatever) that they're using. There's literally no guarantee that the specific version you need will be in any given user's browser cache, and this is exacerbated if you loading multiple libraries from a CDN, or from different CDNs. If your CDNs serve files with low latency then it may not be a big problem, but not all CDNs do. Slow responding CDNs will slow your page loads down, not the reverse.

      Moreover, if you're serving over HTTP2/SPDY there's even less likely to be a benefit to using a CDN. Again, it's something you need to measure.

      One area where a CDN (e.g., Cloudflare) can benefit you is by serving all your static content to offer users a low-latency experience regardless of where they are in the world, but that's rather a different matter from serving half a dozen libraries from half a dozen different CDNs.

      • richmarr 9 years ago

        > ... something you should measure rather than just assume

        How do people do this?

        My obstacle with this point is each CDN would have a different impact in each locale due to the locations of their points of presence, and CDN-ing each resource would a different impact based on the sites that particuar individual had visited.

        Measuring it in any useful way in advance of a change would be really hard unless I'm missing a trick, and measuring it in hindsight would require the kind of performance logging that PaaS-level apps don't have access to.

        • tombrossman 9 years ago

          This would be extremely useful data to have and when I last looked I found little to go on. For example, how great would it be to have an extra column on a page like this with cache hit rates? https://cdnjs.com/libraries

          Nice to have these free hosted assets, but I suspect that a lot of these are an additional download and not actually cached by the majority of visitors. Just having a rough estimate would be helpful when choosing whether to directly serve an asset or use a CDN.

        • leeoniya 9 years ago

          my recommendation is to first serve everything yourself over TLS (minified, bundled, gzipped). if you can, run HTTP2. then measure.

          if this does not deliver the perf you expect (chances are that it will), then look into CDNs.

          if you need to serve media that needs a lot of bandwidth (expensive) then look into CDNs.

          if you need some form of real-time comm like WebSockets or WebRTC that actually requires low latency, look into distributed systems (amazon, google, azure, cloudflare?, etc..)

        • Eridrus 9 years ago

          You can use JavaScript timers to measure how long things take to load and infer how many of the resources come from cache, etc.

    • pluma 9 years ago

      In theory public CDNs are the solution. In practice it only works if a critical mass of websites use them.

      As others have pointed out, the version matters. There are a gazillion different versions of jQuery out there and many websites are very slow to update them if ever. So you only benefit by visiting multiple sites that use exactly the same version.

      Additionally there is a ton of different public CDNs. Now, jQuery itself has solved this by providing an official jQuery CDN and pushing it to developers. In fact, the first result for "jQuery CDN" is their own website.

      I'd say jQuery is probably the least bad example of this. If you visit enough websites there's a non-zero chance you'll have a CDN cache hit for jQuery on a few of them.

      The problem is that jQuery rarely comes alone. Even jQuery UI is less widely deployed than jQuery and thus less likely to be cached from a CDN. But once you get into plugin territory or third-party libraries all bets are off.

      I'm fairly certain that jQuery is the only library that has some realistic chance of benefiting from a public CDN. Everything else is probably asking for trouble (if only because you're adding unnecessary single points of failure).

    • inian 9 years ago

      Also, the cache size on mobile devices especially on iOS is ridiculously low. I don't see my cache storage exceed more than 10 MB even though I use my browser a lot. High chance that you are _not_ going to find your common CDN libraries there.

    • lukaszkups 9 years ago

      I'm a webdev. I had to speed up web e-mail client, written in angular.js. Besides splitting huge request into couple smaller, async ones (huuge speed improvement) I've also switched downloading files from CDN to app server itself - that gives additional 4sec (sic!) shorter load time.

      • pedalpete 9 years ago

        Can you explain how you achieved the 4 second difference using the webserver over the CDN? That seems orders of magnitude faster than I would have expected. I thought you would have been looking at 100ms one way or the other.

        • radicalbyte 9 years ago

          At my current client I'm pretty sure that they've got a Raspberry Pi 1 as a proxy server. So all request to outside their network are horribly slow.

    • leeoniya 9 years ago

      this has been true in my own testing (with plain SSL/TLS, not even SPDY/HTTP2/QUIC), but you can read another analysis that reaches the same conclusion here: https://thethemefoundry.com/blog/why-we-dont-use-a-cdn-spdy-...

      another important reason i personally don't use CDNs is the privacy of my users.

      • supergreg 9 years ago

        >another important reason i personally don't use CDNs is the privacy of my users.

        How high up the privacy concern is using a CDN vs having cloudflare in front of you or everything being in AWS or using a browser by an ad company?

        • leeoniya 9 years ago

          > having cloudflare in front of you or everything being in AWS

          sure, sometimes it's unavoidable when there is real value added in using these platforms. at the end of the day most of us have to entrust our infrastructure to some hosting provider. I trust a provider I pay (with a known privacy policy) more than a public CDN (with a privacy policy that allows them to make money in unspecified ways)

          but yes, centralized ssl termination for distributed systems is an issue that's difficult to mitigate WRT privacy. maybe via https://en.wikipedia.org/wiki/Multipath_TCP ? i don't know enough here.

          > using a browser by an ad company

          this is up to the user

    • andrewguenther 9 years ago

      jQuery was a poor example in the original comment. You almost 100% have jQuery cached if you're getting it from CDN. But this only works for extremely widely used libraries (jQuery, Bootstrap, etc.). It drives me nuts when I see tiny little libraries in Github bragging about being on a CDN. They just don't have the usage for it to be efficient.

  • Ntrails 9 years ago

    > 1.22MB embedded youtube players

    A guy who I used to post with wrote a new forum for us all to post on (woo splinter groups). It's pretty cool. One of the things it does is serve a static image of the underlying youtube and then load it on click. When a 'tube might be quoted 7 times on a page - that's a pretty useful trick.

    I'd just assumed this was a standard forum feature and then I opened a "Music Megathread" on an ipboard and holy shit loading 30 youtube players was painful.

    • leeoniya 9 years ago

      yes, you can lazy-load the youtube player, there's also a jquery plugin for it [1]. but this setup requires 2 clicks to play on mobile (because google restricts the player from autoplaying onload in mobile devices due to bandwidth/data-usage concerns). the first click will appear to do nothing as it loads the 1.22MB player, so delivers a crap experience.

      you're much better off just serving an html5 <video> tag. you can likely fit your entire video in mobile quality in that 1.22MB :D

      [1] https://github.com/tylerpearson/lazyYT

      • keypress 9 years ago

        I'd still prefer a video link, and leave it to my client to negotiate. I have never enjoyed viewing embedded videos in web pages (content aside).

      • eps 9 years ago

        It's an issue with a poor UI design, not lazy-loading though.

  • amelius 9 years ago

    Actually I don't even read the articles anymore when on mobile. I just use HN, and hope somebody posts a TL;DR, or some relevant comment that gives some more information about the article. Only if this is not the case will I consider clicking on the article link. It's pretty sad actually.

    I secretly wish there was some way that allows us (as a community) to collaboratively "pirate" articles, perhaps as a torrent (IPFS perhaps), so we only have to download the ascii text.

    • patrec 9 years ago

      Try turning off javascript in your main mobile browser (e.g. safari) and turning it on in your secondary one (e.g. chrome).

    • Pete_D 9 years ago

      I usually only see lynx and friends mentioned as jokes, but 'elinks --dump <url> | less' really can be a huge readability improvement for some articles.

    • acdha 9 years ago

      I use Brave as my primary mobile browser because it allows me to have JavaScript disabled but trivially enable it when I need it. The experience is massively better – on LTE the average page load is under a second except maybe for a video or lots of images, and you avoid all of the user-hostile pop-ups, app nags, etc.

      (I subscribe to most of the sites which I read regularly so I don't feel guilty about blocking ads but I really hope someone successfully makes a Google Contributor-style network where a trusted browser serves only static image/text ads, as much resistance as that will trigger)

  • keypress 9 years ago

    Back in the day we'd try and get page sizes down to less than 100k.

    The Internet isn't fast for everyone. I (in the UK) have no 3G signal, let alone 4G and my broadband speed is pitiful - but it will do. There is nothing I can do to ramp the pipe speed up. I do end up turning off JS and images a lot of the time, because otherwise it ills me.

    As a web dev, I don't care for bloat. So I find it particularly irksome, and currently it's enough to deter me from going mobile. Once I'd have dreamed about having a modern smartphone in my pocket with any Internet connection, but the friction currently today puts me off. The UK was recently slammed for its retrograde networks.

  • syphilis2 9 years ago

    This is only slightly related, but I've noticed HN comment pages take a second or two to load when there are many comments (500+). Page sizes are not unreasonable (100-200 kb). What is the cause for these pages loading so slowly?

    Example 900+ comment page: https://news.ycombinator.com/item?id=11116274

    Example 2200+ comment page: https://news.ycombinator.com/item?id=12907201

    • yellowbeard 9 years ago

      The server is taking a long time to render the response. In the first link, 650ms or so. Open the network tab in your browser developer tools to see the details.

      • icebraining 9 years ago

        Slashdot's solution to this was to render the HTML on write, instead of on read. Whenever someone posted a new commend, a static HTML file would get generated, so that it could be rapidly served to readers. Movable Type works (can work?) much in the same way, though for blogs.

        I know that caching achieves a similar result and is an accepted component of a modern architecture, but I wonder if we wouldn't be better served by implementing these sort of "upfront caches" in more software.

        • manarth 9 years ago

            Whenever someone posted a new commend, a static HTML file would get generated
          
          It's essentially still a cache, with the same concerns/issues - mainly cache invalidation - as a HTTP cache.

          An on-disk cache of that sort has other issues though. For example, what if a request is received whilst the file is being written? Does the system deliver a half-written HTML page? You can introduce locking or atomic file operations (write to a temporary directory, then mv the file into the correct location: mv is an atomic operation), but this adds more complexity to the cache logic.

          One of the benefits of a HTTP cache is that they generally have a default graceful failure mode: if the cached data doesn't exist, request it from the backend application server.

          • icebraining 9 years ago

            The issue doesn't seem any different - assuming you have concurrent requests, you also have to make sure the cache doesn't return a result until the whole document is written to it.

            • manarth 9 years ago

              That's a fair point - I didn't consider that issue, because using a standard HTTP cache, that's taken care of.

              It's just become somebody else's problem (the developers of the HTTP cache), but it's fair to say it's still an issue to solve.

  • roryisok 9 years ago

    > literally cannot consume the internet without uMatrix & uBlock Origin

    hear hear. and on mobile, its painful because I can't have those (windows phone at least). planning on buying a DD-WRT compatible router soon so I can do some kind of router level ad-blocking and let me browse on the phone again

    PS: opera mobile for android has a built in adblocker

    • supergreg 9 years ago

      I use uBlock in Firefox for Android. There's also AdAway to block domains in the /etc/hosts file.

    • nattmat 9 years ago

      A Raspberry Pi running Pi-Hole is what you want. https://pi-hole.net/

      • roryisok 9 years ago

        Ok, that's even better. I can get a pi for less than a new router, build my own case. If only my router had a usb port, I could even power it off that

    • neurostimulant 9 years ago

      Or just configure your router to use an ad blocking DNS (host your own or use one of those public DNS servers). Works with any router as long as it allows you to edit the default DNS config. This helps a lot when browsing on home wifi connection. Too bad you can't edit DNS config on mobile device without rooting/jailbreaking.

    • leeoniya 9 years ago

      yep, i use opera mobile and also am rooted so use a hostfile blocker (AdAway).

      uBlock Origin exists for Firefox Mobile btw, but i prefer opera's speed/ui.

  • jiehong 9 years ago

    It's more general than just webpages, but applies to everything under the name “Jevon's Paradox”(https://en.wikipedia.org/wiki/Jevons%27s_paradox).

    Basically: more power -> more resources can be analysed in the same time, and not faster to answer.

  • hawski 9 years ago

    I am with you, but I don't believe that this solution will be ever used by the majority of developers.

    Browser caches should be bigger. They also should be more intelligent. It does not make sense to evict a library from cache if it is the most popular library used. Maybe having two buckets, one for popular libraries and another for the rest.

    I think that it would help if script tag had a hash attribute. Then cache could become more efficient. But without the first part it would be useless. Example:

      <script src="https://cdn.example.com/jquery.js"
      sha256=18741e66ea59c430e9a8474dbaf52aa7372ef7ea2cf77580b37b2cfe0dcb3fd7>
      </script>
    
    Or different syntax (whatever I'm not W3C):

      <script src="https://cdn.example.com/jquery.js">
      <hash>
        <sha256>
          18741e66ea59c430e9a8474dbaf52aa7372ef7ea2cf77580b37b2cfe0dcb3fd7
        </sha256>
      </hash>
      </script>
    
    I would like to make an experiment, but as I am not experienced with webdev as much, it could take too much time for me. Test all major browsers with fresh install and default settings. Go to reddit or other links aggregator and load in sequence several links in same order on every browser. Check how efficiently cache was used. I would expect that after 10th site is loaded nothing would remain from the 1st one. Even though the same version of some library and maybe even link to CDN was used.

    I am amazed how quickly fully static pages work even after I am on capped speed mobile connection (after I use 1.5 GB packet).

    EDIT: The most helpful thing would be to have good dead-code removing compilers for JavaScript.

  • kup0 9 years ago

    Yeah the problem really isn't the technology, it's our poor use of it.

    • visarga 9 years ago

      Some files could be hashed and cached locally, such as jQuery files and fonts. I don't know why we have to load it every time from the web, it's the same library for a large number of sites. Just add a hash tag in the <script> to make sure it's the same file.

      • manarth 9 years ago

        When JQuery is served by the official CDN, it's given a cache-header giving it a lifespan of 10 years.

        • leeoniya 9 years ago

          the bandwidth isnt the problem.

          your browser still has to make the https connection to the cdn and request the file with a Last-Modified or Etag header so the server can return a 304 Not Modified response. This is the real cost, not the download size itself.

          I'm not sure how often browsers choose to do this, but if you refresh a page, they all will.

  • jimlawruk 9 years ago

    The use of CDNs is not primarily for speeding up page load times, but rather to offload bandwidth from the web server. Low budget Web sites don't always have money for server farms and are severely limited by how much bandwidth they can serve. One post to HN can take them down. Free CDNs are the poor man's approach to this problem.

  • Animats 9 years ago

    Right. CNN now has over 50 trackers and other junk blocked by Ghostery.

  • EGreg 9 years ago

    One gripe: the canonical CDNs may have the library already cached from your visits to other sites, which is faster.

    I wish that web browsers would use content addressing to load stuff and do SRIs. If I already loaded a javascript file from another url, why load it again?

  • zeveb 9 years ago

    This, this, a thousand times this.

    There's a site I read, really like and financially support, but which has some pretty terrible slowness & UI issues. It's so bad that they've started a campaign recently to fix those issues. But when I check Privacy Badger, NoScript and uBlock, there's a reason that it's so terrible slow: they're loading huge amounts of JavaScript and what can only be called cruft.

    Honestly, I think that they'd come out ahead of the game if they'd just serve static pages and have a fundraising drive semi-annually.

  • pjmlp 9 years ago

    The browsers should have stayed HTML/CSS but then someone had this idea to try to compete with native applications....

    • leeoniya 9 years ago

      but they can compete when written properly, at least for business/productivity apps.

      though not for anything requiring direct hardware access, AAA games included.

    • jasonkostempski 9 years ago

      I sort of agree, i just think JS and content from other domains should have required user prompts.

      • leeoniya 9 years ago

        and https by default should have never allowed off-domain or insecure/mixed content.

        • jasonkostempski 9 years ago

          We can dream but, even if it had been that way, I suspect it would have taken just one bad browser (probably IE) to ruin it for everyone. They'd advertise it as "no annoying prompts", everyone would flock to it and other browsers would have to follow.

    • endless1234 9 years ago

      >but then someone had this idea to try to compete with native applications

      You say as if that's a bad thing. Just wondering, for instance, do you use some standalone maps software instead of Google/Bing/Openstreet/... maps?

      • pjmlp 9 years ago

        Ever heard of GPS devices, mobile phones or tablets?

        I always use native software on my own computers, unless I don't have another option and have to go the web app instead.

        Same applies to projects I work on, if I am allowed to choose.

        • endless1234 9 years ago

          No, I have not heard of GPS devices, mobile phones or tablets. Thank you for introducing me!

  • lostboys67 9 years ago

    Testify Testify Brother Testify!

tedunangst 9 years ago

So, uh, what does it do? I mean, I can't even tell if it's a server or client side change.

  • tyingq 9 years ago

    The research paper[1] describes Polaris. Basically, you have to make large, sweeping changes to your html, server side. Instead of your original page + js references, you serve a bunch of javascript that then dynamically recreates your page on the client side in the most performant way that it can:

    • The scheduler itself is just inline JavaScript code.

    • The Scout dependency graph for the page is represented as a JavaScript variable inside the scheduler.

    • DNS prefetch hints indicate to the browser that the scheduler will be contacting certain hostnames in the near future.

    • Finally, the stub contains the page’s original HTML, which is broken into chunks as determined by Scout’s fine-grained dependency resolution

    [1]http://web.mit.edu/ravinet/www/polaris_nsdi16.pdf

  • naor2013 9 years ago

    It's in between. It's a way for website developers to explain the dependencies for the files (in html or JavaScript or whatever) and a change needed for browsers to know how to use this new data to make better requests to the server.

GrumpyNl 9 years ago

Talk to some people in the porn industry. They will tell you how important fast pages are. You will also be surprised what they have done to achieve this.

mikeytown2 9 years ago

From my experience preconnect is a big improvement for connecting to 3rd party domains. Will also mention that once you have JS deferred, CSS on a 3rd party domain (google fonts) can cause some major slowdowns in terms of the start rendering metrics if using HTTP/2 on a slow connection; all the bandwidth is used for the primary domain connection and not used for blocking resources, end result being images get downloaded before external blocking CSS.

jakeogh 9 years ago

The web is way better without JS. Rendering engines could in principal do the same (improved) dependency tracking.

kvz 9 years ago

I feel Webpack deserves a mention as it resolves the dependencies at build-time and compiles one (or a few chunked/entrybased) assets, hence also solving the problem of too many roundtrips

WhiteSource1 9 years ago

What are you looking to learn about a CDN?

There are many ways to accelerate page speed and, like everything else, it's a question of costs and benefits. For most things, some level of technical debt is OK and CDNs even for jQuery are good. Of course, good design and setting things up right is always the best - and the other question is where your site traffic comes from.

uaaa 9 years ago

Is there a comparison with Google AMP?

  • andrewguenther 9 years ago

    Paper was released way before AMP. Also not really related to AMP, so I don't think it's necessary to draw a comparison.

Thiez 9 years ago

> What Polaris does is automatically track all of the interactions between objects, which can number in the thousands for a single page. For example, it notes when one object reads the data in another object, or updates a value in another object. It then uses its detailed log of these interactions to create a “dependency graph” for the page.

> Mickens offers the analogy of a travelling businessperson. When you visit one city, you sometimes discover more cities you have to visit before going home. If someone gave you the entire list of cities ahead of time, you could plan the fastest possible route. Without the list, though, you have to discover new cities as you go, which results in unnecessary zig-zagging between far-away cities.

What a terrible analogy. Finding a topological sorting is O(|V|+|E|), while the traveling salesman problem is NP-complete.

  • mfonda 9 years ago

    He's not making a comparison to the traveling salesman problem; he's saying the businessperson only intended to visit one city, but the trip ended up requiring visits to several additional cities.

    It's not a terrible analogy. You request an HTML page and you don't know until after you load it (visit the initial city) exactly what other resources--images, css, js, etc.--you'll need to download (additional cities to visit).

  • to3m 9 years ago

    That's amusing, and I wonder if this particular analogy was chosen deliberately. But I don't think there's anything wrong with it - it's designed to make intuitive sense to non-programming readers, not to be some rigorous description that can be automatically translated into optimal code.

    • Thiez 9 years ago

      Perhaps a different analogy would be better, e.g. ordering materials when building a house. If you can only order one material at a time, you would probably want to order the concrete for laying the foundation before ordering the roof tiles. Loading website resources is a lot like that (at least compared to the travelling salesman).

      • saurik 9 years ago

        This still smells NP Hard. I mean, in practice for simple dependencies it is probably quite tractable, but this is a combinatorial optimization problem that seems pretty similar to an online modification to Job Shop Scheduling, where the material requirements map loosely to machine-task pairings that would be unblocked by orders, acting to make the problem more complex, not easier.

        • Thiez 9 years ago

          Your sense of smell is off. Assuming a directed acyclic graph (the usual shape of a dependency tree), assume we write the result order to a list L. Walk over all vertices once, create a mapping M of each vertex to its number of incoming edges, and add all vertices without incoming edges to a list R. This is O(|V| + |E|). Now, pick the first item of R and append it to L. For each outgoing edge in the item we chose, decrement the associated count of its receiving vertex in M by 1. For each item where the count becomes 0, append it to R. Keep running until R is empty, and you're done. This second part of the process is again O(|V| + |E|).

          We're done and found a topological order in O(|V| + |E|).

          • saurik 9 years ago

            Yes, I know that topological sort is O(|V| + |E|). What I am claiming is that the problem of buying building materials in an optimal order isn't topological sort: there are numerous topological orders of a graph, but some will be much much slower than others to build at your construction site. To determine which of the many possible orders is the fastest one, you have to take into account how long various tasks will take and how many workers you have available for those tasks. When you get some materials, that unlocks certain parts of the gantt chart of what sounds like Job Shop Scheduling. To me, this is the same form of complaint as pointing out that Traveling Salesman isn't Topological Sort.

        • ooqr 9 years ago

          "If someone gave you the entire list of cities ahead of time, you could plan the fastest possible route."

          Dynamic programming. So it saves time in the end.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection