I am making a multiplayer strategy simulation game. The game runs in turns of fixed duration, e.g. 1 minute.
- For every user, there is a set of state variables that can change every turn, e.g. amount of worker units assigned to gathering food, amount of food already gathered, etc.
- The values of these variables in every turn are calculated based on their values on the previous turn: e.g. the amount of food in turn 3 =
food_in_turn2 + food_workers_turn2*food_gathered_per_worker - The user can tweak some of these variables, e.g. how many workers are assigned to gathering food.
The state variables for each user can be saved in the DB every turn, or can be lazily calculated on user request, and only stored to DB when this lazy calculation happens. E.g. when the user's last known turn is turn 5, and the request is asking for turn 10, we do calculations for 5 turns and return the last values, the ones for turn 10. Right now I have the lazy solution implemented.
Let's say I want to deploy a new version of the service, where the calculation of the state in a new turn has changed. E.g. food_gathered_per_worker was changed from 1 to 2 in the new version. I would like this update to come into effect for all users on the same turn, otherwise it can be unfair for some users.
How do I handle deploying this without downtime? The deployment is just docker in kubernetes pods.
Some simple solutions I considered that would not work:
- If I just deploy the new service and shut down the old one, the lazy calculation would start from different points for different users whenever there's a new request. E.g. user1's last state was at 10:30, user2's last state was at 10:40, new service deployed at 10:50 --> user1 would have 20 minutes (10:30-10:50) of turns lazily calculated with the new service, while user 2 would have 10 minutes (10:40-10:50). Problem: The difference in calculation for the period 10:30-10:40 would be unfair to one of the users.
- Update the state for every user for the last game turn that should run with the old version. Start deployment the new version and pause the simulation until deployment is done. Problem: this would be downtime.
- Whenever I want to deploy, calculate the state for every user for several turns in the future, until a predefined time of switching to the new service, e.g. 11:00. Deploy the new version, with the calculation configured to take effect at 11:00. Until then, return the pre-calculated state. Problem: If the user wants to tweak something during this pre-calculated period, the state for all the following turns until 11:00 would need to be recalculated. However the old version of the service that should be used here, might have already been shut down and replaced by the new one. Recalculating with the new simulation service would bring us back to the first case where results would be unfair for some users.
The example with food gathering is a simplification - the actual calculations are non-trivial, and would be too complicated to store externally and load dynamically while the service is running.
Possible solution I see:
- Start from idea 3 above, and keep both the old and new version up. Have a smart gateway/load-balancer service that notices two versions are running, and redirects incoming requests from users correctly during the transition time. If the user is tweaking state variables, redirect to the old service and recalculate future states until the switching time of 11:00. When we pass the switch time of 11:00, the old version can be shut down and all user requests will go to the new version of the simulation.
Am I overengineering this? Is there a simpler solution I'm missing?