added 214 characters in body

Source Link

edited Sep 17, 2012 at 0:17

871
2
8
19

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the raw data or aggregation results

IMHO, "log all events to files, crawl them at a later time to filter and aggregate the stream" is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it's in MySQL.

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the raw data or aggregation results

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the raw data or aggregation results

IMHO, "log all events to files, crawl them at a later time to filter and aggregate the stream" is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it's in MySQL.

Tweeted twitter.com/#!/StackProgrammer/status/228075976226127873

occurred Jul 25, 2012 at 10:35

added 23 characters in body

Source Link

edited Jul 19, 2012 at 21:23

elliot42

871
2
8
19

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the raw data or aggregation results

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the data or

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the raw data or aggregation results

Source Link

asked Jul 19, 2012 at 18:21

elliot42

871
2
8
19

Data architecture for event log metrics?

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."

We are trying to make two basic decisions:

What to store? Storing every event vs. only storing aggregates
- (Event log style) log every event and count them later, vs.
- (Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
- In a relational database (particularly MySQL)
- In a non-relational (NoSQL) database
- In flat log files (collected centrally over the network via syslog-ng)

What is standard practice / where can I read more about comparing the different types of systems?

Additional details:

The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the data or

Stack Exchange Network

Return to Question

Data architecture for event log metrics?