I have a script that does some poor man caching of data returned from an API to a flat file as JSON objects. One result/JSON object per line.
The caching workflow is as follows:
Read in entire cache file -> check if each line is too old, line by line -> save the ones that aren't too old to new list -> print the new fresh cache list to a file, and also use the new list as a filter to not work on that incoming data for the API calls.
By far the longest part of this process is bolded above. Here is the code:
print "Reading cache file into memory ---"
with open('cache', 'r') as f:
cache_lines = f.readlines()
print "Turning cache lines into json and checking if they are stale or not ---"
for line in cache_lines
# Load the line back up as a json object
try:
json_line = json.loads(line)
except Exception as e:
print e
# Get the delta to determine if data is stale.
delta = meta_dict["timestamp_start"] - parser.parse(json_line['timestamp_start'])
# If the data is still fresh then hold onto it
if cache_timeout >= delta:
fresh_cache.append(json_line)
It can take minutes depending on the size of the hash file. Is there a faster way to do this? I understand reading the whole file in isn't ideal in the first place but it was easiest to implement.