I'm processing a list of items (200k - 300k), each item processing time is between 2 to 8 seconds. To gain time, I can process this list in parallel. As I'm in an async context, I use something like this :
public async Task<List<Keyword>> DoWord(List<string> keyword)
{
ConcurrentBag<Keyword> keywordResults = new ConcurrentBag<Keyword>();
if (keyword.Count > 0)
{
try
{
var tasks = keyword.Select(async kw =>
{
return await Work(kw).ConfigureAwait(false);
});
keywordResults = new ConcurrentBag<Keyword>(await Task.WhenAll(tasks).ConfigureAwait(false));
}
catch (AggregateException ae)
{
foreach (Exception innerEx in ae.InnerExceptions)
{
log.ErrorFormat("Core threads exception: {0}", innerEx);
}
}
}
return keywordResults.ToList();
}
The keyword list contains always 8 elements (comming from above) thus I process my list 8 by 8 but, in this case, I guess that if 7 keywords are processed in 3 secs and the 8th is processed in 10 secs, the total time for the 8 keywords will be 10 (correct me if i'm wrong).
How Can I approach from the Parallel.Foreach then? I mean : launch 8 keywords if 1 of them is done, launch 1 more. In this case I'll have 8 working processes permanently. Any idea ?
TPL DataFlowto set up a pipeline to process the items?