Titanium – a Clojure graph library built on top of Titan
blog.clojurewerkz.orgDoes anyone have experience working with Titan, either on the side or in production? Is Titan production ready?
The flexible storage backends, clustering, and the open source license are all very enticing. I've been looking for a graph database for an upcoming project and have yet to find something that really matches what we're looking for.
Titan is used in production, and it's one piece of the Aurelius Graph Cluster (http://thinkaurelius.com/subscription/), which is some of the most impressive tech to hit the open-source scene in the last few years.
Matthias Broecheler (https://twitter.com/MBroecheler) is the original creator of Titan, and he is incredibly bright. When he finished his PhD, he linked up with Marko Rodriquez (https://twitter.com/twarko), the creator of Gremlin (https://github.com/tinkerpop/gremlin/wiki), and they formed Aurelius to focus on building the big-data graph ecosystem (like Cloudera for graphs -- in fact, the Aurelius Cluster integrates with Hadoop and Cloudera).
There are other distributed graph databases, but most of these are batch processing engines like Pregel. However, Titan is a real-time, transactional graph database backed by either Cassandra or HBase, and it provides fast, horizontally scalable write performance (10,000+ tps) that hasn't been available in an open-source graph database.
See http://thinkaurelius.com/2012/08/06/titan-provides-real-time...
Combining this with Faunus for batch processing and the Aurelius Graph Cluster's integration with the Hadoop ecosystem makes for an incredibly powerful platform for building applications such as social startups.
See Matthias' C* 2012 presentation: Titan - Big Graph Data With Cassandra: http://www.youtube.com/watch?v=ZkAYA4Kd8JE
The Titan user group is here: https://groups.google.com/forum/?fromgroups#!forum/aureliusg...
The Gremlin user group is here: https://groups.google.com/forum/?fromgroups#!forum/gremlin-u...
We just started using Titan in production last week for shift.com, on a 3 node cassandra cluster. We open sourced our Object Graph Mapper library for Python here:
https://github.com/StartTheShift/thunderdome
There's a few caveats that come with working with distributed databases, so it's important to know what you're getting into. Neo4j might be easier out of the box (since more people are using it), but if you want a robust solution that'll work for 50 or 50,000 users, Titan feels like the way to go.
We have a various clients using Titan in production. Of course, like any project, there are always more desired features. The Titan/Faunus roadmap is greatly influenced by our clients.
I'm currently playing around with it for a large internal development project. My only concern (and why I'm leaning towards using Neo4j at least initially) is that I don't really yet have a good indication if I'm going to be dealing with enough data to warrant a big distributed solution.
I'm actually right now mostly messing with TinkerGraph (an in memory graph database that's part of the Tinkerpop utilities that the Titan guys make).
With Titan/BerkeleyDB you will get blazing performance for a single-machine distribution. One of the wonderful innovations of Titan is vertex-centric indices that is even necessary at single-machine scale.
Next, if you decide to scale horizontally, then you can simply change the storage.backend=cassandra and thats that (of course, you need to do a bulk data transfer from BerkeleyDB to Cassandra).http://thinkaurelius.com/2012/10/25/a-solution-to-the-supernode-problem/That's good to know actually; I think that'll probably be a good approach for my application.
For what it's worth, it was seeing a video of you giving a talk about Titan that made me start looking into it. It seems super neat.
I haven't used it yet, but the reason I plan to use Titan on my next project is because it is the only distributed graph database out there AFAIK.
Another Titan library for Clojure is Hermes: https://github.com/gameclosure/hermes
my team has made a ton of contributions to hermes - it's a pretty solid library. that said, i'm happy to see more traction for clojure + titan, especially from the clojurewerks crew.
Does anyone know if this supports (efficient) graph-rewriting?
I'm thinking of this kind of patterns:
If there are nodes n and n' such that
- there's an edge from n to n' and
- n has a label XY
then add label Y to n'So what I'd want to do is match basic patterns and then add nodes, edges, and labels.
There is support for graph rewriting for a Titan-backed data-set using Faunus. Titan does not support global graph operations. Therefore, Faunus was created to allow you to perform offline graph operations much like the one you've described.
thanks. that looks interesting.
Bolth. Bolth? Bolth.
https://github.com/clojurewerkz/titanium/blob/master/src/clo...