gauged: a time series database
github.comInteresting storage method; I like it. Some notes:
- A MySQL backend should be very fast, but scaling it into multiple shards will have to be an exercise for the user. Perhaps CitusDB (PostgreSQL compatible) could be useful here.
- Metrics cannot be tagged. This will make it useless for any sort of rollups or breakdowns ("give me the sum of requests over my servers in the XYZ data center"; "give me the requests for each server in the XYZ data center by hostname").
The second issue in particular needs attention before it can complete with enterprise-grade metrics solutions such as Datadog.
Tagging would be the next addition. I added the ability to search for keys by prefix quite efficiently, so provided one stored keys like "requests:server1", "requests:server2", one could easily run the following
requests = 0 for key in gauged.keys('requests:'): requests += gauged.aggregate(key, Gauged.SUM, start=-Gauged.WEEK)Tagging needs to be multi-dimensional to be effective (e.g., host=X, device=Y, interface=Z, etc.)
I'd love to see an honest comparison against RRDtool. You mention it once in "Support for sparse data (unlike the fixed-size RRDtool)".
What are the other advantages, disadvantages and trade-offs?