Revisions to caching on multiple servers

correcting size of data

Source Link

edited Apr 18, 2013 at 23:46

297
4
14

Because we need to keep response times low, we get tons of requests, and we need to basically process ALMOST the same data (which I'll refer to as X) each request (the inputs are different though, so we can't cache responses), we are using a technique where we grab a new copy of X every 90 seconds from the database and store it locally in memory as a python list of dictionaries, on our application servers (we are using uwsgi).

The kink in the machine: There are temporary analytics that we need to keep track of in those 90 seconds to adjust our data each iteration, and each iteration is dependent on what we calculate from the last iteration.

The trouble with this is, we have multiple application servers that are storing the same data, X, in memory and each of those servers need to refresh X at the same time to keep calculations consistent for the next interval. I've tried some techniques, like broadcasting a message after each calculation to reload each server's X, but it hasn't been as effective as I would hope, and it just makes things more complicated.

I should say, the reason we haven't used memcached or something similar is because we don't want to sacrifice any speed if we can. Maybe I am ignorant on how fast we can retrieve and load the list into python objects from memcached.

I understand my explanation isn't the greatest, and will answer any questions to give a better picture of the situation.

Edit: we are at about 5000 request/second, the size of the data we process is about 11kb2MB at the moment but will continue to grow, so we'd like to avoid sending it over the wire for each request.

Because we need to keep response times low, we get tons of requests, and we need to basically process ALMOST the same data (which I'll refer to as X) each request (the inputs are different though, so we can't cache responses), we are using a technique where we grab a new copy of X every 90 seconds from the database and store it locally in memory as a python list of dictionaries, on our application servers (we are using uwsgi).

The kink in the machine: There are temporary analytics that we need to keep track of in those 90 seconds to adjust our data each iteration, and each iteration is dependent on what we calculate from the last iteration.

The trouble with this is, we have multiple application servers that are storing the same data, X, in memory and each of those servers need to refresh X at the same time to keep calculations consistent for the next interval. I've tried some techniques, like broadcasting a message after each calculation to reload each server's X, but it hasn't been as effective as I would hope, and it just makes things more complicated.

I should say, the reason we haven't used memcached or something similar is because we don't want to sacrifice any speed if we can. Maybe I am ignorant on how fast we can retrieve and load the list into python objects from memcached.

I understand my explanation isn't the greatest, and will answer any questions to give a better picture of the situation.

Edit: we are at about 5000 request/second, the size of the data we process is about 11kb at the moment but will continue to grow, so we'd like to avoid sending it over the wire for each request.

Because we need to keep response times low, we get tons of requests, and we need to basically process ALMOST the same data (which I'll refer to as X) each request (the inputs are different though, so we can't cache responses), we are using a technique where we grab a new copy of X every 90 seconds from the database and store it locally in memory as a python list of dictionaries, on our application servers (we are using uwsgi).

The kink in the machine: There are temporary analytics that we need to keep track of in those 90 seconds to adjust our data each iteration, and each iteration is dependent on what we calculate from the last iteration.

The trouble with this is, we have multiple application servers that are storing the same data, X, in memory and each of those servers need to refresh X at the same time to keep calculations consistent for the next interval. I've tried some techniques, like broadcasting a message after each calculation to reload each server's X, but it hasn't been as effective as I would hope, and it just makes things more complicated.

I should say, the reason we haven't used memcached or something similar is because we don't want to sacrifice any speed if we can. Maybe I am ignorant on how fast we can retrieve and load the list into python objects from memcached.

I understand my explanation isn't the greatest, and will answer any questions to give a better picture of the situation.

Edit: we are at about 5000 request/second, the size of the data we process is about 2MB at the moment but will continue to grow, so we'd like to avoid sending it over the wire for each request.

Added a new tag performance

Link

edit approved Apr 16, 2013 at 19:39

AllTooSir

460
3
16

Stack Exchange Network

Return to Question