50

Looking for a little advice on leveraging AsParallel() or Parallel.ForEach() to speed this up.

See the method I've got (simplified/bastardized for this example) below.

It takes a list like "US, FR, APAC", where "APAC" is an alias for maybe 50 other "US, FR, JP, IT, GB" etc. countires. The method should take "US, FR, APAC", and convert it to a list of "US", "FR", plus all the countries that are in "APAC".

private IEnumerable<string> Countries (string[] countriesAndAliases)
{
    var countries = new List<string>();

    foreach (var countryOrAlias in countriesAndAliases)
    {
        if (IsCountryNotAlias(countryOrAlias))
        {
            countries.Add(countryOrAlias);
        }
        else 
        {
            foreach (var aliasCountry in AliasCountryLists[countryOrAlias]) 
            {
                countries.Add(aliasCountry);
            }
        }
    }

    return countries.Distinct();
}

Is making this parallelized as simple as changing it to what's below? Is there more nuance to using AsParallel() than this? Should I be using Parallel.ForEach() instead of foreach? What rules of thumb should I use when parallelizing foreach loops?

private IEnumerable<string> Countries (string[] countriesAndAliases)
{
    var countries = new List<string>();

    foreach (var countryOrAlias in countriesAndAliases.AsParallel())
    {
        if (IsCountryNotAlias(countryOrAlias))
        {
            countries.Add(countryOrAlias);
        }
        else 
        {
            foreach (var aliasCountry in AliasCountryLists[countryOrAlias].AsParallel()) 
            {
                countries.Add(aliasCountry);
            }
        }
    }

    return countries.Distinct();
}
1
  • 7
    In reality, no. I'm just trying to use this as an exercise to learn about new parallelization in .NET. :) Commented Sep 23, 2010 at 17:46

4 Answers 4

74

Several points.

writing just countriesAndAliases.AsParallel() is useless. AsParallel() makes part of Linq query that comes after it execute in parallel. Part is empty, so no use at all.

generally you should repace foreach with Parallel.ForEach(). But beware of not thread safe code! You have it. You can't just wrap it into foreach because List<T>.Add is not thread safe itself.

so you should do like this (sorry, i didn't test, but it compiles):

        return countriesAndAliases
            .AsParallel()
            .SelectMany(s => 
                IsCountryNotAlias(s)
                    ? Enumerable.Repeat(s,1)
                    : AliasCountryLists[s]
                ).Distinct();

Edit:

You must be sure about two more things:

  1. IsCountryNotAlias must be thread safe. It would be even better if it is pure function.
  2. No one will modify AliasCountryLists in a meanwhile, because dictionaries are not thread safe. Or use ConcurrentDictionary to be sure.

Useful links that will help you:

Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4

Parallel Programming in .NET 4 Coding Guidelines

When Should I Use Parallel.ForEach? When Should I Use PLINQ?

PS: As you see new parallel features are not as obvious as they look (and feel).

Sign up to request clarification or add additional context in comments.

Comments

14

When using AsParallel(), you need to make sure that your body is thread safe. Unfortunately, the above code will not work. List<T> is not thread safe, so your addition of AsParallel() will cause a race condition.

If, however, you switch your collections to using a collection in System.Collections.Concurrent, such as ConcurrentBag<T>, the above code will most likely work.

3 Comments

"addition of AsParallel()" as it is done in code above will not cause race condition, because in c#'s foreach it will be processed sequentially.
@Andrey: True - he'd have to be using the extension methods on ParallelQuery<T> (or Parallel.ForEach instead) to have any effect.
@ReedCopsey , List<T> is not threadsafe I know , but I already know that plinq divides the work into chunks , and then collects the results ( not like parallel.foreach - Which I ahve to do the work by myself) - so is that code is threadsafe ? i.imgur.com/mcEwWg2.png , I mean - I will always get 10 different numbers ! , so is it threadsafe or not ?
3

I would prefer to use another data structure like a Set for each alias and then use Set union to merge them.

Something like this

public string[] ExpandAliases(string[] countries){
    // Alias definitions
    var apac = new HashSet<string> { "US", "FR", ...};
    ... 

    var aliases = new HashMap<string, Set<string>> { {"APAC": apac}, ... };

    var expanded = new HashSet<string>
    foreach(var country in countries){
        if(aliases.Contains(country)
            expanded.Union(aliases[country]);
        else{
            expanded.Add(country);
    }

    return expanded.ToArray();
}

Note: code should be viewed as pseudo-code.

2 Comments

well, question was about PLINQ and Parallel
It's also about speedup and sometimes the right use of data structures is enough.
1

This seems like an inherently serial operation to me. All you're doing is looping through a list of strings and inserting them into another list. The parallelization libraries are going to do that, plus a bunch of threading and synchronization - it'd probably end up slower.

Also, you should be using a HashSet<string> if you don't want duplicates.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.