2

When using Parallel.ForEach, would converting any DB or Api calls to async methods improve performance?

A bit of background, I currently have a console application that sequentially loops through a bunch of files and for each one calls an API and makes some DB calls. The main logic looks like this:

foreach (file in files)
{
    ReadTheFileAndComputeAFewThings(file);
    CallAWebService(file);
    MakeAFewDbCalls(file);
}

Currently all of the DB and web service calls are synchronous.

Changing the loop to use Parallel.ForEach has given me a massive performance increase, just as you would expect.

I am wondering if I kept the Parallel.ForEach call there, and inside the loop, change all of the webservice calls to be async (eg, HttpClient.SendAsync) and DB calls to be async (using Dapper, db.ExecuteAsync()) - would that increase performance of the application by allowing it to re-use threads? Or would it effectively do nothing as Parallel.ForEach is taking care of the thread allocation anyway?

7
  • 1
    what would be better is make a few calls as possible, so write sql so that the db calls can be called once. Commented Jan 20, 2020 at 5:18
  • yes it would improve performance if the amount of time it is spent waiting exceed the amount of work the thread could of been doing. so depending on what you looping over. It could worsen performance as well. it depends, on how many and how quickly the loops are being done. Again your better off reducing calls were you can. Eg... get all the data for all the files then loop over, referencing the mem rather than making call each time. Commented Jan 20, 2020 at 5:21
  • Could you please clarify if the question is "Would making DB calls async inside a Parallel.ForEach loop improve performance" or "Would converting all calls to async running in parallel improve performance compared to Parallel.ForEach"? If former - please edit post to clarify what you plan to do (as async + Parallel.ForEach requires solid understanding of both and more... hence chances a random user to get it right are low... so that approach totally depends on how badly you implement it :) ) Commented Jan 20, 2020 at 5:40
  • 1
    @Seabizkit pure Parallel.ForEach is easy to implement (already shown in the post) and pure async with .WhenAll is easy to implement... Getting async insider Parallel.ForEach is major pain. So while pure approaches will lead to comparable perf and correctness (so being opinion-based question) mixing two will cause major headache and could be factually answered based on how broken proposed solution would be. Commented Jan 20, 2020 at 5:57
  • Thanks very much @AlexeiLevenkov - I've updated the question. I mean keeping the parallel.foreach and making stuff inside the loop async. Wondering if it would help or not. Commented Jan 20, 2020 at 6:01

3 Answers 3

5

The answer is No. Asynchrony offers scalability, not performance. It allows to do the same job with less threads, and so with less memory (each blocked thread = 1 MB of wasted memory).

It’s important to keep in mind, though, that asynchronicity is not a performance optimization for an individual operation. Taking a synchronous operation and making it asynchronous will invariably degrade the performance of that one operation, as it still needs to accomplish everything that the synchronous operation did, but now with additional constraints and considerations.

It should be noted that the Parallel.ForEach API cannot be used with asynchronous body delegate. Using async with this API is a bug. The correct API to use when you want to parallelize asynchronous operations is the Parallel.ForEachAsync, available from .NET 6 and later.

Sign up to request clarification or add additional context in comments.

4 Comments

Note that this answer does not really apply to question as asked as OP seem to highlight (by function names in particular) that calls are I/O and not CPU bound... Indeed in that case converting serial synchronous code into parallel async or Parallel.ForEach would give essentially the same benefits and performance of switching from Parallel to completely async may indeed improve... And this answer also does not talk about actual question - "async inside a Parallel.ForEach loop ..." Both of this concerns are indicated by +2 votes :)
@AlexeiLevenkov I edited the answer by removing the fluff, and mentioning the incompatibility between Parallel.ForEach and async.
May I ask, would it be better to use the Parallel.ForEach async, or use dapper's QueryAsync instead?
@BIBOOnation I am not familiar with the Dapper, so I can't answer this.
1

Parallel.ForEach operates on tasks, not threads. It means it can spawn more tasks, than you have threads in thread pool. In this scenario using async methods can give you performance optimization by doing all tasks with less threads.

https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netcore-3.1

The Parallel.ForEach method may use more tasks than threads over the lifetime of its execution, as existing tasks complete and are replaced by new tasks. This gives the underlying TaskScheduler object the chance to add, change, or remove threads that service the loop.

1 Comment

Note that this post does not answer "Would making DB calls async inside a Parallel.ForEach loop improve performance?" as stated in title of the question... Which may be fine if OP actually wanted to ask the question this post answers... Commented on the question to ask for clarification.
0

original

foreach (file in files)
{
    ReadTheFileAndComputeAFewThings(file);
    CallAWebService(file);
    MakeAFewDbCalls(file);
}

original + async (better than above, depending!)

foreach (file in files)
{
   await ReadTheFileAndComputeAFewThings(file);
   await CallAWebService(file);
   await MakeAFewDbCalls(file);
}

This will not be better if the calls are not actually implementing async , then it will be worse. Another way this will be worse is if the async-ness is so short they it out weight the cost of Task. Each async Task, creates a managed thread, which reverse 1mb from system, and add thread syncing time. Altho the syncing is extremely low if this is done in a tight loop it will see performance issues.

Key here is the Task must actually be the async versions.

  • SaveChanges vs SaveChangesAsync

  • Read vs ReadAsync


Parallel (better than above, depending!)

Parallel.ForEach(files, item) 
{
    ReadTheFileAndComputeAFewThings(item);
    CallAWebService(item);
    MakeAFewDbCalls(item);
}

If this can all happen at the same time, then this is better. Also only if you want to assign multiple thread, resources, remember resources are limited, you machine only has so many cores and ram, you would want to manage this depending on what else the hardware is responsible for.

Not better if the methods are not thread safe.


Parallel + async (better than above, depending!)

Parallel.ForEach(files, item) 
{
   await ReadTheFileAndComputeAFewThings(item);
   await CallAWebService(item);
   await MakeAFewDbCalls(item);
}

FYI - Parallel + async example above is actually incorrect!!! As the Parallel.ForEach itself is not async, you will need to do some research as to how to build a async version of Parallel.ForEach

Also the same comments above apply when using in conjunction.

Update

based on a comment it largly depend on whether ConfigureAwait() has been set, but assuming you haven't then. Also this will not excute in order so if CallAWebService depends on ReadTheFileAndComputeAFewThings then things will probably do wrong.

foreach (file in files)
{
   List<Task> jobs = new List<Task>();
   jobs.Add(ReadTheFileAndComputeAFewThings(file))
   jobs.Add(CallAWebService(file))
   jobs.Add(MakeAFewDbCalls(file))
   Task.WhenAll(jobs.ToArray());
}

or...

 List<Task> jobs = new List<Task>();
foreach (file in files)
{
   jobs .Add(ReadTheFileAndComputeAFewThings(file))
   jobs .Add(CallAWebService(file))
   jobs .Add(MakeAFewDbCalls(file))
}
Task.WhenAll(jobs.ToArray());

difference between the two is the the one has a lot more tasks, and you probably run into issues with the later regarding context.... aka the enumerator will no longer have the correct "index" to file and if the one call had a dependency on the other being completed first.

Amazing link explaining async... https://learn.microsoft.com/en-us/archive/blogs/benwilli/tasks-are-still-not-threads-and-async-is-not-parallel

3 Comments

You really should do .WhenAll instead of foreach in async version (or at least show one)... there is no useful gains without it.
@AlexeiLevenkov updated based on this... i get what your saying but partly depends on how the code inside the methods looks.
Would love to hear from the marker down, as to how this is not helpful. Mind blown.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.