Clojure at BackType: Cascalog, ElephantDB and Storm
slideshare.netHave you considered using Neo4j with the batch inserter as a write-once-read-many DB in place of ElephantDB?
One of the most important parts of ElephantDB is that it disassociates the creation of an index from the serving of that index. This means we can create, read, and update an ElephantDB domain solely on Hadoop without any dependencies on other systems being alive and functioning. Additionally, Neo4j is not a distributed database, while ElephantDB is horizontally scalable. You can read more about ElephantDB here:
http://tech.backtype.com/introducing-elephantdb-a-distribute...
Essentially I was thinking that Neo4j could replace the BDB files (because graphs can be commutatively added by copying from both A and B into C, that means they can be used in map/reduce) but with all the graph links in place. Queries could then consist of cascalog jobs constructing and walking subgraphs. So for example "all the friends of Bob" consists of (map) copying the subgraph of friend-links of Bob and (reduce) merging the sub graphs by addition.