5

Possible Duplicate:
c# Leaner way of initializing int array

Basically I would like to know if there is a more efficent code than the one shown below

    private static int[] GetDefaultSeriesArray(int size, int value)
    {
        int[] result = new int[size];
        for (int i = 0; i < size; i++)
        {
            result[i] = value;
        }
        return result;
    }

where size can vary from 10 to 150000. For small arrays is not an issue, but there should be a better way to do the above. I am using VS2010(.NET 4.0)

9
  • your are setting value to all the fields of your array? Commented Dec 20, 2012 at 21:18
  • yes he is setting each element to a non-default int value - and this is probably the fastest way except for micro optimisation (someone will no doubt show some micro optimisation technique) Commented Dec 20, 2012 at 21:19
  • 1
    No matter how you do it, initializing an array with non-default values in C# is an O(N) operation. Commented Dec 20, 2012 at 21:19
  • @PaulSullivan You could get non-trivial improvements from effective parallelization. Commented Dec 20, 2012 at 21:20
  • 1
    only micro optimisation is ++i in the for loop Commented Dec 20, 2012 at 21:47

6 Answers 6

8

C#/CLR does not have built in way to initalize array with non-default values.

Your code is as efficient as it could get if you measure in operations per item.

You can get potentially faster initialization if you initialize chunks of huge array in parallel. This approach will need careful tuning due to non-trivial cost of mutlithread operations.

Much better results can be obtained by analizing your needs and potentially removing whole initialization alltogether. I.e. if array is normally contains constant value you can implement some sort of COW (copy on write) approach where your object initially have no backing array and simpy returns constant value, that on write to an element it would create (potentially partial) backing array for modified segment.

Slower but more compact code (that potentially easier to read) would be to use Enumerable.Repeat. Note that ToArray will cause significant amount of memory to be allocated for large arrays (which may also endup with allocations on LOH) - High memory consumption with Enumerable.Range?.

 var result = Enumerable.Repeat(value, size).ToArray();
Sign up to request clarification or add additional context in comments.

8 Comments

This will be quite a bit less efficient, not more efficient.
@Servy: That is probably why Alexei said "Your code is as efficient as it could get" before giving this answer.
@HonzaBrestan Well, I disagree with that statement. Beyond that, the question asked for a better option. If you don't have a better option then just don't answer, as opposed to providing something you know is worse.
Btw, this has like Enumerable.Range the problem that you might end with an array with is nearly twice a large as necessary due to the doubling algorithm behind ToArray since Enumerable.Repeat yields an unknown size. It's a good example why OP's way is the best but only with an explanation.
@Servy, would you mind to provide answer that gives array creation/initialization with lower overhead than OP's for loop? I'd happily upvote it... so far you've suggested parallel initialization that would be faster, but to extent less efficient as it require more operations per element...
|
4

One way that you can improve speed is by utilizing Array.Copy. It's able to work at a lower level in which it's bulk assigning larger sections of memory.

By batching the assignments you can end up copying the array from one section to itself.

On top of that, the batches themselves can be quite effectively paralleized.

Here is my initial code up. On my machine (which only has two cores) with a sample array of size 10 million items, I was getting a 15% or so speedup. You'll need to play around with the batch size (try to stay in multiples of your page size to keep it efficient) to tune it to the size of items that you have. For smaller arrays it'll end up almost identical to your code as it won't get past filling up the first batch, but it also won't be (noticeably) worse in those cases either.

private const int batchSize = 1048576;
private static int[] GetDefaultSeriesArray2(int size, int value)
{

    int[] result = new int[size];

    //fill the first batch normally
    int end = Math.Min(batchSize, size);
    for (int i = 0; i < end; i++)
    {
        result[i] = value;
    }

    int numBatches = size / batchSize;

    Parallel.For(1, numBatches, batch =>
    {
        Array.Copy(result, 0, result, batch * batchSize, batchSize);
    });

    //handle partial leftover batch
    for (int i = numBatches * batchSize; i < size; i++)
    {
        result[i] = value;
    }

    return result;
}

2 Comments

+1. Nice suggestion. Did you had chance to compare what improvement Copy alone (without Parallel) gives?
@AlexeiLevenkov A few simple tests had it close to the same. That's something that needs to be tuned well to get much benefit out of it (you need to have just the right batch size, for example).
1

Another way to improve performance is with a pretty basic technique: loop unrolling.

I have written some code to initialize an array with 20 million items, this is done repeatedly 100 times and an average is calculated. Without unrolling the loop, this takes about 44 MS. With loop unrolling of 10 the process is finished in 23 MS.

 private void Looper()
        {
            int repeats = 100;
            float avg = 0;

            ArrayList times = new ArrayList();

            for (int i = 0; i < repeats; i++)
                times.Add(Time()); 

            Console.WriteLine(GetAverage(times)); //44

            times.Clear();

            for (int i = 0; i < repeats; i++)            
                times.Add(TimeUnrolled()); 

            Console.WriteLine(GetAverage(times)); //22

        }

 private float GetAverage(ArrayList times)
        {
            long total = 0;
            foreach (var item in times)
            {
                total += (long)item;
            }

            return total / times.Count;
        }

        private long Time()
        {
            Stopwatch sw = new Stopwatch();
            int size = 20000000;
            int[] result = new int[size];
            sw.Start();


            for (int i = 0; i < size; i++)
            {
                result[i] = 5;
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
            return sw.ElapsedMilliseconds;
        }

        private long TimeUnrolled()
        {
            Stopwatch sw = new Stopwatch();
            int size = 20000000;
            int[] result = new int[size];
            sw.Start();


            for (int i = 0; i < size; i += 10)
            {
                result[i] = 5;
                result[i + 1] = 5;
                result[i + 2] = 5;
                result[i + 3] = 5;
                result[i + 4] = 5;
                result[i + 5] = 5;
                result[i + 6] = 5;
                result[i + 7] = 5;
                result[i + 8] = 5;
                result[i + 9] = 5;
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
            return sw.ElapsedMilliseconds;
        }

1 Comment

Not the best code, but it shows my point. All in all it's a 50% improvement, not much if you do this once, but adds up.
0
Enumerable.Repeat(value, size).ToArray();

Comments

0

Reading up Enumerable.Repeat is 20 times slower than the ops standard for loop and the only thing I found which might improve its speed is

private static int[] GetDefaultSeriesArray(int size, int value)
{
    int[] result = new int[size];
    for (int i = 0; i < size; ++i)
    {
        result[i] = value;
    }
    return result;
}

NOTE the i++ is changed to ++i. i++ copies i, increments i, and returns the original value. ++i just returns the incremented value

Comments

-2

As someone already mentioned, you can leverage parallel processing like this:

int[] result = new int[size];
Parallel.ForEach(result, x => x = value);
return result;

Sorry I had no time to do performance testing on this (don't have VS installed on this machine) but if you can do it and share the results it would be great.

EDIT: As per comment, while I still think that in terms of performance they are equivalent, you can try the parallel for loop:

Parallel.For(0, size, i => result[i] = value);

3 Comments

This would not be an effective way of parallelizing this. It's almost certainly going to be worse as you'll lose memory locallity and the unit of work is just too small. If you used a parallel.for, at least, it'd have a much better shot.
@EDIT: you can always test the performance very easily.. then you wouldn't have to guess.
I just ran a few simple test with both of these and the parallel versions took about 5x longer, which was about what I expected.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.