Distributed Cache Warmup

Question

I have a web application (spring based war) which is deployed in a Tomcat webserver. This web application is served by several server instances each running an instance of Tomcat. I intend to cache some data on a Redis datastore and all application instances contact this datastore to read data. As a pre-step I want some of the data to be cached up in Redis when the application starts.

If I do it via the web-app, all of the nodes will try to initialize the cache. Making one of the instances leader is one of the options, are there any better solutions to this?

Restarting: Indicates : Stop tomcat and then start it again. It could be done for several reasons: deploying a new version of the web app / server (machine) restart / new server being added to the pool. It's unlikely that all of the tomcat instances would be started at the same time but some of them may be started at the near same time.
Cache server is independent of the web-app, but in case it also crashed and the data was lost. I will maintain the "last read" TS in the Cache as well.

There are not many texts (any?) about cache warmup "best practices", so +1 for daring to ask the question. However, can you put a little more detail in it? What does restarting mean for an application on many nodes? Why do you restart? Why does the cache content external to the application node not survive the restart? — cruftex
– cruftex, Commented Aug 31, 2016 at 7:58
@cruftex Thank you for the comments. I have added some more context. — TJ-
– TJ-, Commented Aug 31, 2016 at 9:11
A rough idea: Leader election means additional complexity, also you want all your nodes to do the warmup. Given there is a key space to be warmed up, define an order. Within the ordered key space make each application node start at a random offset and stop after it finds a key that is already populated. — cruftex
– cruftex, Commented Aug 31, 2016 at 9:35

mp911de · Accepted Answer · 2016-08-31 10:04:55Z

You have at least two options here:

Either you warm your cache from inside the application
Or you do it outside

Either approach has its own properties and from here on you can derive a solution.

Warming inside the application

You have inside your application all infrastructure to perform cache warming. If cache warming is idempotent, then it could be fine to have it done by all your instances. If you don't want to deal all of your applications with cache warming, then you need to make a decision:

What will other instances do while the cache is warming?

I have two answers to that question:

Ignore and continue startup
Wait until the cache is warmed

In both cases, you can use Redis CAS (Compare-and-swap) with timeouts to create an expiring, distributed lock so the fastest starting instance will do warming and release the lock once it is done.

You could tackle that challenge also with a process that you deploy the first instance to your servers which perform cache warming. You wait until that application has finished its job and then you're good to go for the other instances.

Warming outside the application

With cache warming outside the application, you don't run into concurrency issues, but that requires some effort on the operating and development side. You need someone (or some process) that runs the cache warming before your application start/deployment. You also need to build that piece of code that will access your data and put it into the cache.

Building leader patterns would also work but require additional code/components. If you can, keep it simple.

HTH, Mark

Collectives™ on Stack Overflow

Distributed Cache Warmup

1 Answer 1

Warming inside the application

Warming outside the application

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Warming inside the application

Warming outside the application

Comments

Related