Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

5
  • 1
    Any luck on this project? Commented Sep 20, 2012 at 13:46
  • 2
    @hiwaylon We've ended up using a hybrid system: 1) MySQL where possible (low volume) (makes aggregation easy using SELECT...GROUP BY, can easily store the results of SELECTs), 2) using Graphite for simple large-scale aggregation and visualization, and 3) logging full events for reference, and for watching details of the data flow in in real time. Each has actually been valuable in different ways. Commented Jan 8, 2013 at 3:47
  • That sounds like a great solution, quite similar with what we are doing as well. Commented Jan 10, 2013 at 16:57
  • 1
    UPDATE over a year later we built a system that logged everything, and periodically iterated over the logs counting things, and then stored those counted numbers into a database (could/should have been a a time-series database, but MySQL sufficed). This was a few weeks of work but ended up being a surprisingly powerful/fast approach--when it's just your code iterating over logged JSON, it's easy to add a lot of metadata, and easy for your code to have flexible rules for exactly what it wants to count. Commented Mar 7, 2014 at 18:06
  • 1
    Update 2016: Kafka can do these kinds of things these days, at least for raw storage. Then you can either stick them into a big MapReduce or Spark job, or a big warehouse like Vertica etc. if you want to query/aggregate over them. Commented May 18, 2016 at 0:41