I am working on a project where a user can input different criteria which will be used to fetch tweets, lets call this action as TweetAnalysis. These tweets will then be sent to another internal system (REST API) to do some calculation and get results. Each tweet will have a unique result from REST API. For each TweetAnalysis created by users, there could be million of tweets and each tweet can have their respective results returned from API. (only 2 values from the results are aggregatable, every other value of the result is unique between tweets)
How would i design such a system?
What I was thinking is
- User creates a
TweetAnalysis(let's call itTA) and it is stored in db. - A separate process picks up a
TAand retrieve all the respective tweets for it. These tweets can be dumped into anS3object? While doing so, the S3 objects will be unique for eachTAand can be broken down into chunks of 1000 tweets? - A separate process can pick up those
S3objects, gather their respective info fromREST APIsystem and persist the values in db?