I'm creating an API for a web service. There are a couple of endpoints in this API that require some time (lets say +5min) to compute, so I'm creating a prefetch system that will call in the background this endpoints every certain period of time and cache the results, when users call this endpoints, they will get the cached response without having to wait a lot of time. The functions that all this endpoints call will do three things:
- Make some external API calls to fetch raw data.
- Process the raw data (this is what takes a lot of time)
- Store the processed data in DB and return it
I'm having lots of issues with this endpoints, because every time this prefetch system goes down or has any issue, they get exposed to all users and if a user make a request and there is no cached response, then the endpoint will start the computation that takes a lot of time.
I'm not sure how should I handle this, I could add a parameter prefetch to only query and process the data if it is called from the prefetch system, otherwise just query the DB for the stored processed data, if there is no data in DB it will return an empty response, but I think this is way better than what happens now.
Here is an example of the function for a given endpoint in Python:
@cache
def calculate_weather():
    # Call some APIs and do some operations
    # Store the processed data in DB
    # return the processed data
This is what I have in mind doing:
@cache
def calculate_weather(prefetch):
    if prefetch:
        # Call some APIs and do some operations
        # Store the processed data in DB
    else:
        # Get the processed data from DB
    # return the processed data
In this second example, the function that the API endpoint calls would have a boolean parameter prefetch, that will only be equal to True when the function is called form the prefetch system.
I think this would solve in the sort term this issue, but maybe I can do something else, any suggestions?