7

I've got a method:

IList<string> pages = new List<string>();
foreach (var node in nodes)
{
    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
        {
            pages.Add(node.Title);
        }
    }
    catch (Exception)
    {
        continue;
    }
}

DoSomeComplicatedModificationOnNode() gives exception in some cases, that's why the try{} catch block is used - I can skip the items which gives exception. The number of nodes contains several thousands of items, an item has several properties. How can I optimize this loop? I was thinking about Parallel.Foreach, but the following code gives me an error "Missing current principal":

IList<string> pages = new List<string>();
Parallel.ForEach(pageNodes, node =>
{
    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
        {
            pages.Add(node.Title);
        }
    }
    catch (Exception)
    {
    }
});
5
  • 5
    Can you copy the whole error message you are getting whit the Parrallel object? Commented Apr 25, 2014 at 9:32
  • 1
    Catching exception isn't a great idea , you should catch the correct exceptions. Commented Apr 25, 2014 at 9:56
  • As I understand code in DoSomeComplicatedModificationOnNode method request server? Commented Apr 25, 2014 at 10:02
  • 1
    What is your problem? As in: if DoSOmeCOmplicatedMdificationnNode is not using a lot of perfdormance, the loop overhad is minimal. Grab a profiler and find out hat is right or wrong. Commented Apr 25, 2014 at 10:04
  • As a sidenote: You probably want to use ToLowerInvariant, not ToLower. Commented Apr 25, 2014 at 15:31

5 Answers 5

11

In C#, generic list are not thread-safe, so you can not add a items in a parallel loop.

I recommend using another class like ConcurrentBag, ConcurrentStack or ConcurrentQueue.

var pages = new ConcurrentBag<string>();
Parallel.ForEach(pageNodes, node =>
{
    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
            pages.Add(node.Title);
    }
    catch (Exception)
    {
        throw;
    }
});

Remember that parallel tasks are disordered, if you want an order you will have to use an index in Parallel. List are only thead-save for reading.

System.Threading.Tasks.Parallel.For(0, pageNodes.Count, index =>
{
    string node = pageNodes[index];

    try
    {
        string temp = DoSomeComplicatedModificationOnNode(node);
        if (temp.ToLower().Contains(path))
            pages.Add(MyPage(index, node.Title));
    }
    catch (Exception)
    {
        throw;
    }
});
Sign up to request clarification or add additional context in comments.

1 Comment

If performance is critical avoiding the synchronization and using separate lists for each thread (and merging at the end) could give a nice speedup, but I don't think we can do that reasonably easy with Parallel.ForEach.
3

I would recommend to use PLINQ for such purposes. Parallel LINQ is a parallel implementation of LINQ and has the same set of operations. Code written using PLINQ follows functional style rules - there is no any updates, just mapping of current list in parallel mode. It can increase performance for your case by running a mappers in different threads and then gather result in one single "dataset". Of course it can boost performance only in the case you have CPU with few core (but as usual nowadays we all have few cores).

Here is an example

    private static void Main(string[] args)
    {
        var result =
            GenerateList()
                .AsParallel()
                .Select(MapToString)
                .Where(x => !String.IsNullOrWhiteSpace(x))
                .ToList();

        Console.ReadKey();
    }

    private const string Path = "1";
    private static string MapToString( int node)
    {
        //Console.WriteLine("Thread id: {0}", Thread.CurrentThread.ManagedThreadId);
        try
        {
            string temp = DoSomeComplicatedModificationOnNode(node);
            if (temp.ToLower().Contains(Path))
            {
                return temp;
            }
        }
        catch (Exception)
        {
            return null;
        }

        return null;
    }
    private static IEnumerable<int> GenerateList()
    {
        for (var i=0; i <= 10000; i++)
            yield return i;
    }

    private static string DoSomeComplicatedModificationOnNode(int node)
    {
        return node.ToString(CultureInfo.InvariantCulture);
    }

Comments

2

List<T> is in most cases not thread-safe. Have a look at the thread-safe collections like e.g. ConcurrentBag<T>.

2 Comments

Ignorant comment. The exitence of parallel.foreach has nothing to say for the paralellism of a list. List<T> is not thread safe for inserts, that IS a problem. Parallel.Foreach has many uses - abusing it on a code block that uses a non thread save contaier is not one of them.
It is a valid mistake, but I don't see how using List<T> can cause anything like "missing current principal"
0

The real performance problem is that you are catching the exceptions, just try to notify the result using variables.

For long running methods you should use asyncronous (asyc/await) methods.

Beware with parallel operations (in LINQ user AsParallel()) because your resources are limited and you could be without memory when you least expect it. And your code must be thread-safe (List is not thread-safe).

I would bet that you will not have a better performance than the following code:

var pages = nodes.Select(x => new { Status = DoSomeComplicatedModificationOnNode(x), Node = x })
    .Select(x => x?.Result)
    .Where(x => x.Status.IsCorrect && x.Status.ToLowerInvariant().Contains(path))
    .Select(x => x.Node.Title)
    .ToList();

Or using asyncronous + LINQ:

var pages = nodes.Select(async x =>
{
    return new { Status = await DoSomeComplicatedModificationOnNode(x), Node = x };
})
.Select(x => x?.Result)
.Where(x => x.Status.IsCorrect && x.Status.ToLowerInvariant().Contains(path))
.Select(x => x.Node.Title)
.ToList();

Until the call to ToList the entire query will not be executed.

Comments

0

No, using multiple threads will likely not make your loop any faster. There are three obvious mistakes in your code, that make it run slower.

  1. The pages list will be re-allocated multiple times.
  2. The call to ToLower() creates a temporary string on each run.
  3. The DoSomeComplicatedModificationOnNode might throw.

See fixed version below.

// Give the list an initial capacity (best guess), avoiding re-allocations.
var pages = new List<string>(nodes.Length);
foreach (var node in nodes)
{
    string temp = DoSomeComplicatedModificationOnNode(node, out var error);
    if (error != null)
    {
        continue;
    }

    // IndexOf() allows for allocation-free case insensitive string search.
    if (temp.IndexOf(path, StringComparison.CurrentCultureIgnoreCase) >= 0)
    {
        pages.Add(node.Title);
    }
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.