Settings

Theme

Analytics at GitHub

johnnunemaker.com

125 points by janerik 12 years ago · 16 comments

Reader

phunge 12 years ago

Jay Kreps speaks the truth. His talk "Building LinkedIn's Real-time Data Pipeline" is along the same lines as the Log blogpost mentioned here and is also extremely informative.

fuziontech 12 years ago

Fantastic read. Concise and solid decision explanations. Thanks for writing!

Was there any other reason you chose kestrel over alternatives like kafka? Did you test any others, or where you just that satisfied with kestrel?

  • jnunemaker 12 years ago

    We chose kestrel mostly just from usage/familiarity. We've been satisfied with it, but are currently researching/testing kafka.

khaledh 12 years ago

Very good article. It aligns with our envisioned architecture for our next-gen analytics platform.

So far our decision is to keep the raw events in Cassandra, and pre-aggregate most data for fast reads. Just wondering about your decision to not store raw events in Cassandra, and use raw files for that, and using Cassandra only for storing Hadoop analysis results. Do you think this decision may affect you later if you ever decide to support real-time analytics?

nicklovescode 12 years ago

As an aside, do you have any info on the visual software used to run the charts? I'm guessing d3 is there somewhere., but maybe not. I've struggled to find a beautiful charting library and yours are beautiful!

  • calavera 12 years ago

    we use d3 for all our charts.

    • nicklovescode 12 years ago

      any chance of you guys open-sourcing them?

      • Caged 12 years ago

        Most of our graphs are pretty stock d3 code tailored for specific datasets, so I don't see much value in open sourcing them. Is there anything in particular you're interested in?

        • jrpt 12 years ago

          There's a need for a good charting library built on top of d3. Kind of like Highcharts, in terms of usability, but free. d3 is powerful but not as easy to use and customize as Highcharts.

          • middleman90 12 years ago

            Can I suggest http://www.sibdo.com For individuals it's free and built on top of d3 with some extra functionality that Higcharts does't have. You can even drag files directly onto to the visualizations and the data will render. Also really nice UI for mobile.

            • sheff 12 years ago

              Looking at the Sibdo pricing page, it looks like much higher pricing (compared to the more established competitors) at $95 a month for use on a SINGLE website with a confusing limitation to "50 users" whatever that means.

              Not only that, the example graphs and charts look very basic.

              • middleman90 12 years ago

                Good feedback thanks

                It would be interesting to know what you mean by basic as we're a start-up and would appreciate any feedback.

nickstinemates 12 years ago

> For any business, the process of collecting data, measuring performance, making changes, and reviewing if those changes were successful is really important.

This applies for any sort of goal/process/?, whether programmatic or personal.

Very cool story, I'm looking forward to additional features. We pull a lot of data about Docker from GitHub that could be more readily available. We'd be more than happy to discuss or beta any new features, if you're interested.

alexatkeplar 12 years ago

Nice to see lots of parallels to how we have architected things at Snowplow (trackers -> collectors -> enrich -> storage -> analytics)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection