Revisions to Performing parallel processing on a file

added 991 characters in body

Source Link

edited Mar 29, 2016 at 6:41

381
1
4

UPDATE

I'm adding example for partitioner and local finally usage

private static long ParallelPartitionerWithLocal(long from, long to)
{
    long result = 0;
    var partitioner = Partitioner.Create(from, to, 
               (to - from)/(Environment.ProcessorCount));

    Parallel.ForEach(partitioner, () => 0L /*local init*/, 
               (range, loopState, subTotal) /*body*/ =>
               {
                   for (var l = range.Item1; l < range.Item2; ++l)
                        subTotal += l;
                   return subTotal;
               }, 
               subTotal /*local finally*/ => Interlocked.Add(ref result, subTotal));

    return result;
}

Because result is global, we can't to change it inside the loop without lock, so we change a local var and in the end change the result just once.

The partitioner here is simple and based on the range and the numbers of cores.

UPDATE

I'm adding example for partitioner and local finally usage

private static long ParallelPartitionerWithLocal(long from, long to)
{
    long result = 0;
    var partitioner = Partitioner.Create(from, to, 
               (to - from)/(Environment.ProcessorCount));

    Parallel.ForEach(partitioner, () => 0L /*local init*/, 
               (range, loopState, subTotal) /*body*/ =>
               {
                   for (var l = range.Item1; l < range.Item2; ++l)
                        subTotal += l;
                   return subTotal;
               }, 
               subTotal /*local finally*/ => Interlocked.Add(ref result, subTotal));

    return result;
}

Because result is global, we can't to change it inside the loop without lock, so we change a local var and in the end change the result just once.

The partitioner here is simple and based on the range and the numbers of cores.

added 933 characters in body

Source Link

edited Mar 29, 2016 at 5:13

Dudi Keleti

381
1
4

About the getTargetNode method, First, method names need to be PascalCasing Second, not so important but take a look on some changes I made:

public static Node GetTargetNode(Graph graph, string targetLine)
{
    if (string.IsNullOrEMpty(targetLine))
        throw new ArgumentNullException(nameof(targetLine));

    try
    {
        // Verify that the target node is a node in the graph
        return graph.GetNode(targetLine);
    }
    catch (KeyNotFoundException e)
    {
        throw new KeyNotFoundException("Invalid Input: The Target Node, " + targetLine.Trim() + ", in the Source and Target file is not a node in the graph. ", e);
    }
    catch (Exception e)
    {
        throw new Exception("Invalid Input: The Target Node, " + targetLine.Trim() + ", in the Source and Target file is invalid: " + e.Message, e);
    }
}

About the getTargetNode method, First, method names need to be PascalCasing Second, not so important but take a look on some changes I made:

public static Node GetTargetNode(Graph graph, string targetLine)
{
    if (string.IsNullOrEMpty(targetLine))
        throw new ArgumentNullException(nameof(targetLine));

    try
    {
        // Verify that the target node is a node in the graph
        return graph.GetNode(targetLine);
    }
    catch (KeyNotFoundException e)
    {
        throw new KeyNotFoundException("Invalid Input: The Target Node, " + targetLine.Trim() + ", in the Source and Target file is not a node in the graph. ", e);
    }
    catch (Exception e)
    {
        throw new Exception("Invalid Input: The Target Node, " + targetLine.Trim() + ", in the Source and Target file is invalid: " + e.Message, e);
    }
}

added 139 characters in body

Source Link

edited Mar 28, 2016 at 14:01

Dudi Keleti

381
1
4

I saw that you anyway read till end of file, so I think (you need to measure it) that its will be faster if you'll read it all (line by line as you do) and then parallel the work once and avoid construct the parallelism for every chunk.

Also if you use Parallel.Foreach, you can avoid the null checking for allFileLines[i].

Consider to use a custom partitioner. You must measure it but take in mind that sometimes it better to have a large amount of data with less loops where a lot of loops with small chunk of data.

About the previous comment, if your inside work is short, a partitioner is your way to get a better performance.

Again, you need measure it but it might be faster if you collect the strongestPaths in local list (lock free) and then aggregate them into global list with lock when each work is complete.

For this you need to use this overload:

Parallel.ForEach<TSource, TLocal>(
    IEnumerable<TSource> source,
    Func<TLocal> localInit,
    Func<TSource, ParallelLoopState, TLocal, TLocal> body,
    Action<TLocal> localFinally);

ArrayClear in each loop work - is also can avoided if you use one big chunk. It's not a time consuming but still its need to go over thousands of items and set them to null.

ArrayClear in finally block - in principle, if you set the array to null, the GC will know that all his items are dead, so it redundant to do the clear. I don't now if you decide to do the clear after measuring it. if yes, ignore this comment.

About exceptions, it may be useless for you, but it worth to mention that you can aggregate them inside the loop and decide what to do after the loop is complete. Of course it cost in performance if a lot exception has occured (because the thread safety of the ConcurrentQueue).

    var exceptions = new ConcurrentQueue<Exception>();
    
    Parallel.ForEach(data, _ =>
    {
        try { throw new Exception(); }                   
        catch (Exception e) { exceptions.Enqueue(e); }
    });

    if (exceptions.Count > 0)
        // handle..

In anyway you need to measure every move because in this kind of work the speed depends on your loop work and in your current hardware.

For further reading look in this series

And you can find a file reading benchmark here

I saw that you anyway read till end of file, so I think (you need to measure it) that its will be faster if you'll read it all (line by line as you do) and then parallel the work once and avoid construct the parallelism for every chunk.

Also if you use Parallel.Foreach, you can avoid the null checking for allFileLines[i].

Consider to use a custom partitioner. You must measure it but take in mind that sometimes it better to have a large amount of data with less loops where a lot of loops with small chunk of data.

About the previous comment, if your inside work is short, a partitioner is your way to get a better performance.

Again, you need measure it but it might be faster if you collect the strongestPaths in local list (lock free) and then aggregate them into global list with lock when each work is complete.

For this you need to use this overload:

Parallel.ForEach<TSource, TLocal>(
    IEnumerable<TSource> source,
    Func<TLocal> localInit,
    Func<TSource, ParallelLoopState, TLocal, TLocal> body,
    Action<TLocal> localFinally);

ArrayClear in each loop work - is also can avoided if you use one big chunk. It's not a time consuming but still its need to go over thousands of items and set them to null.

ArrayClear in finally block - in principle, if you set the array to null, the GC will know that all his items are dead, so it redundant to do the clear. I don't now if you decide to do the clear after measuring it. if yes, ignore this comment.

About exceptions, it may be useless for you, but it worth to mention that you can aggregate them inside the loop and decide what to do after the loop is complete. Of course it cost in performance if a lot exception has occured (because the thread safety of the ConcurrentQueue).

    var exceptions = new ConcurrentQueue<Exception>();
    
    Parallel.ForEach(data, _ =>
    {
        try { throw new Exception(); }                   
        catch (Exception e) { exceptions.Enqueue(e); }
    });

    if (exceptions.Count > 0)
        // handle..

For further reading look in this series

And you can find a file reading benchmark here

I saw that you anyway read till end of file, so I think (you need to measure it) that its will be faster if you'll read it all (line by line as you do) and then parallel the work once and avoid construct the parallelism for every chunk.

Also if you use Parallel.Foreach, you can avoid the null checking for allFileLines[i].

Consider to use a custom partitioner. You must measure it but take in mind that sometimes it better to have a large amount of data with less loops where a lot of loops with small chunk of data.

About the previous comment, if your inside work is short, a partitioner is your way to get a better performance.

Again, you need measure it but it might be faster if you collect the strongestPaths in local list (lock free) and then aggregate them into global list with lock when each work is complete.

For this you need to use this overload:

Parallel.ForEach<TSource, TLocal>(
    IEnumerable<TSource> source,
    Func<TLocal> localInit,
    Func<TSource, ParallelLoopState, TLocal, TLocal> body,
    Action<TLocal> localFinally);

ArrayClear in each loop work - is also can avoided if you use one big chunk. It's not a time consuming but still its need to go over thousands of items and set them to null.

ArrayClear in finally block - in principle, if you set the array to null, the GC will know that all his items are dead, so it redundant to do the clear. I don't now if you decide to do the clear after measuring it. if yes, ignore this comment.

About exceptions, it may be useless for you, but it worth to mention that you can aggregate them inside the loop and decide what to do after the loop is complete. Of course it cost in performance if a lot exception has occured (because the thread safety of the ConcurrentQueue).

    var exceptions = new ConcurrentQueue<Exception>();
    
    Parallel.ForEach(data, _ =>
    {
        try { throw new Exception(); }                   
        catch (Exception e) { exceptions.Enqueue(e); }
    });

    if (exceptions.Count > 0)
        // handle..

In anyway you need to measure every move because in this kind of work the speed depends on your loop work and in your current hardware.

For further reading look in this series

And you can find a file reading benchmark here

Source Link

answered Mar 28, 2016 at 13:56

Dudi Keleti

381
1
4

Loading

Stack Exchange Network

Return to Answer