Settings

Theme

Kademlia: A Peer-To-peer Information System Based on the XOR Metric (2002) [pdf]

pdos.csail.mit.edu

191 points by liamzebedee 7 years ago · 42 comments

Reader

bcaa7f3a8bbc 7 years ago

I'm familiar to this paper because I was using eMule. eMule has a direct citation to this paper in its "help" window, otherwise, the decentralization of eD2k would not be possible. The protocol is old, but still impressive.

The year 2000 is special, within the same year, Gnutella, eDonkey and Freenet were all released its initial version, as if the P2P network "emerged" from the Internet. By 2004, eMule, I2P and BitTorrent (and BTW, Tor) were all here. At the very beginning, the first generation of P2P, Gnutella was suffering from its scalability problem - its simple unstructured, flooding protocol needs a number of O(n) queries, and becomes unstable once the users exceeded millions.

Then, Kademlia almost saved P2P. BitTorrent, I2P, eD2k and many other P2P protocols are all Kad-based. For example, in eD2k, Kad allowed it to have P2P resource discovery, and also an efficient decentralized keyboard search engine (although there is spam, but mostly works, and in BitTorrent we still don't have it), all with only a O(log(n)) query, The P2P mechanism of I2P is also exclusively based on Kad.

The authors of this paper are the few people who have made great contributions to the Internet.

nostrademons 7 years ago

I love how people are rediscovering the previous generation of P2P software from 15-20 years ago. Reading through these papers was a huge part of my college procrastination; maybe now that cryptocurrencies give a way to provide economic incentives to those who contribute computing power, we might see a renaissance of some of these older ideas.

  • jamesblonde 7 years ago

    Interestingly, the original Kademlia paper did not make it into any of the big p2p conferences at the time. It was published at some 2nd tier p2p workshop. It was only when the system gained actual adoption, that people referenced the paper.

    In contrast, Chord, which was published 1 year earlier in 2001 is one of the most cited papers in computer science. However, Chord was not practical. It assumed the availability of bi-directional connections between any two hosts. However, NATs had just started to appear, which broke Chord. Kademlia, which does not require peers to maintain a well-defined routing structure (the ring in Chord) can muddle through with NATs - because the routing table is a huge messy tree and if any branches lead you to the destination, it will kind of work.

    • hvidgaard 7 years ago

      Being the engine behind trackerless BitTorrent gave it quite some exposure too. I do not know if it was before or after it became known in academia.

      • jamesblonde 7 years ago

        My point was that it became the engine behind trackerless BT because it worked in the presence of NATs - even if other architectures, like Chord, had superior properties (if you assume an unrealistic model of the open internet).

        The other point of note is that Kademlia was not designed to work in the presence of NATs. That was a feature whose value was only appreciated when ISPs ran out of open IP addresses and NAT'd up whole networks. It shows the value of exploratory, basic research.

  • pjkundert 7 years ago

    It is a key part of the implementation of Holochain (https://holo.host).

    It turns out that global consensus is not required to maintain accounting balance in a global-scale ledger. DHTs and their validation of the data they host is a big part of maintaining the invariant of the system.

    • awgneo 7 years ago

      I have been excited about this project for quite some time. They recently released their Rust client, although without networking. I cannot wait to play with it further as it holds the most promise amongst all "Web3" decentalized platforms.

  • gst 7 years ago

    Kademlia is still in widespread use today. For example Bittorrent uses it to resolve magnet links.

    • geoah 7 years ago

      IPFS is using kademlia, and iirc zeronet are still working on it. I don't think there currently are many alternatives actually being used. Haven't seen many projects using chord or pastry these days.

    • ghayes 7 years ago

      Ethereum also uses Kademlia for its DevP2P mechanism. We recently implemented this in Exthereum/Mana.

    • sliken 7 years ago

      Quite handy for any program to find a copy of itself running anywhere on the internet. It's the best solution I've found for peer discovery that doesn't depend on a central host.

      There's a few implementations for most platforms. At least Java, Javascript, C++, C, and Go anyways.

      • voltagex_ 7 years ago

        Doesn't Kad require an initial bootstrap node? How do you find it?

        • sliken 7 years ago

          Generally if you downloaded it from somewhere/someone you can get a working node from them. Often binaries have a few boostrap hosts hardwired, but then save state once launched. Typically a host tracks several hosts per bucket, and a bucket for each log(n) of the population. So with a million hosts your client will track log(million)*4 or similar. Those are usually written to disk when you close, so you can try those when you launch next.

          Clients often broadcast on the local lan to find peers that way.

          Also other methods are common, for instance bittorrent clients often use the PEX standard to exchange peers, those peers are often on the same DHT, although I think there might be 2 distinct DHT networks among the popular bittorrent clients.

          Worst case you could start randomly walking the IPv4 space, given that there's millions of active clients it wouldn't be long before you found one. Likely only take you a few 1000 random IPs before you found someone on the DHT.

        • adrusi 7 years ago

          Yes but that doesn't require a central host. If your friend is connected to the network you can join it via them.

          • voltagex_ 7 years ago

            If I've just downloaded SomeCoolNewTool that's using the network, are there bootstrap hosts pre-loaded? Who's my "friend" in that case? Are there trust issues here?

            • cf 7 years ago

              In practice, SomeCoolNewTool will be bundled with some seed hosts.

  • squarefoot 7 years ago

    I still miss the first Napster and later the OpenNap network with its OSS clients such as Lopster. Way flawed and not scalable at all compared to BitTorrent, but it allowed 1 to 1 connections between people, so that one day I could browse a random guy shared hard drive because he had some interesting music, see a band named "Spock's Beard" and being a trekkie myself, albeit a moderate one, I simply had to hear what they sounded like, and after an eternity due to then slow connections, thanks to other bands discovered the same way, I realized that progressive rock was (and still is) even more alive today than in the 70s, although obviously not mainstream. So a huge thankyou to the old flawed and slow Napster as well.

  • eeZah7Ux 7 years ago

    Kademlia is also an interesting option for a small and resilient key-value store between a handful of hosts.

jMyles 7 years ago

Kademlia is awesome and is, of course, the basis of libp2p-kad-dht and the DevP2P project that many decentralized projects are using at the moment.

At NuCypher, we found that, all the way to the core of Kademlia, there were some "inter-planetary" assumptions that, while both mindblowing and fit for many use-cases, didn't fit our "intra-planetary" distributed notion of network topology.

If Kademlia interests you, you might also be interested in this overview of what we did and why:

https://blog.nucypher.com/why-were-rolling-our-own-intra-pla...

  • mohitmun 7 years ago

    Huge fan of Nucypher. Recently At one hackathon, I used your proxy re-encryption to build decentralised KYC system. Amazing work you guys are doing!

jondubois 7 years ago

A lot of cryptocurrency projects are using Kademlia but it's prone to vulnerabilities such as eclipse attacks. Because peer connections are made according to deterministic rules (e.g. often based on unique Node IDs), the topology of the network can be analyzed and then the information can be exploited to disrupt the network.

There are solutions which involve making it hard to generate IDs (I've seen it done with node IDs generated using hash-based proof of work) but nowadays with all the cheap ASIC miners, you could compute these hashes much faster than any regular computer could; this means that node IDs need to be generated using top-of-the-range ASICs in order to make the network truly secure; this adds a lot of complexity when it comes to deploying a node. I don't believe that any of the existing Kademlia-based cryptocurrencies currently require sufficiently complex node IDs.

tombert 7 years ago

I wrote a video-sharing thing during a hackathon based on Kademlia. It's one of my favorite algorithms and I found the paper to be easier to read than most papers based on algorithms. Here's a video of a talk I gave on it:

https://www.youtube.com/watch?v=byIFT8OzfX0

mohitmun 7 years ago

Recently I figured Kademlia is how bittorrent achieves trackerless torrents. I used bittorrent-dht[0][1] to build simple decentralised ride hailing app(was able implement POC with end to end booking). I wonder what other modern problems can be solved by this technology

[0]: https://github.com/webtorrent/bittorrent-dht

[1]: http://www.bittorrent.org/beps/bep_0005.html

woadwarrior01 7 years ago

Over a decade ago, I was a part of a small team at a startup that built a p2p messenger called cSpace[1], I fondly remember reading the paper and implementing a Kademlia DHT for it. IIRC, the bittorrent DHT was the only open source Python implementation of it that we could find, it was using Twisted for networking and quite complicated, we decided to keep things simple and rewrite it ourselves. The resulting implementation[2] is quite simple and elegant, IMO.

[1]: https://github.com/jmcvetta/cspace (The only copy of it, I could find online) [2]: https://github.com/jmcvetta/cspace/tree/master/cspace/dht

loeg 7 years ago

(2007), if not earlier, FWIW. I used this protocol in a college final project ("make a botnet") years ago :-).

cf 7 years ago

Interestingly, Kademlia ended up being a major component in lots of botnets for a time like https://en.m.wikipedia.org/wiki/Storm_botnet

miguelrochefort 7 years ago

Has any progress in the P2P a been made since Kademlia?

I've been searching for a while, for a DHT (actually a distributed graph database) that self optimizes based on usage. Basically, I want data that's frequently accessed together to get close to eachother and to nodes that frequently access it.

new12345 7 years ago

I don't understand inspite of having working p2p protocol why we don't yet have an good p2p alternative to dropbox that just works without any setup hassles. Closest working thing I know of is BitSync but that too requires an mediator server.

  • yholio 7 years ago

    Because Dropbox is a file storage service, not a file distribution system. The typical use case is that an uploaded large file will be downloaded a single time by someone else, at different times when computers are online.

    This requires an intermediary system with very high availability and performance, which in P2P can only be implemented by duplicating data on a dozen or so systems.

    Since files must be private and there is no commonality like in BitTorrent, it follows that running a P2P Dropbox would consume 5-10 times the storage and bandwidth of it's centralized counterpart for the same useful service.

  • amitport 7 years ago

    IMO. The p2p business model is problematic in this case.

    1 People need to incentives to join a p2p network and certainly to contribute storage+bandwidth

    1* Reliably and privacy issues makes this a bigger problem (though p2p may also be the solution, it's not easily achievable)

    2 This can't easily happen without investment. Marketing + setup without "hassles" is not cheap

    2* Given investment, what are the assets of the investors?

  • e12e 7 years ago

    Ipfs doesn't take much setup, but it's global sharing - so you'll have to bring your own access control (eg: encrypt the files, and use à key distribution/sharing scheme of some sort).

  • jonathanstrange 7 years ago

    I'm out of the loop for a few years now, but isn't NAT hole punching still one of the biggest problems? AFAIK, there is still no reliable solution that works without an accessible 3rd party server.

dmourati 7 years ago

Attended a "Papers we Love" presentation on this paper last year.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection