46

I hear on MSDN that an array is faster than a collection.

Can you tell me how string[] is faster then List<string>.

5 Answers 5

52

Arrays are a lower level abstraction than collections such as lists. The CLR knows about arrays directly, so there's slightly less work involved in iterating, accessing etc.

However, this should almost never dictate which you actually use. The performance difference will be negligible in most real-world applications. I rarely find it appropriate to use arrays rather than the various generic collection classes, and indeed some consider arrays somewhat harmful. One significant downside is that there's no such thing as an immutable array (other than an empty one)... whereas you can expose read-only collections through an API relatively easily.

Sign up to request clarification or add additional context in comments.

2 Comments

Given that MSDN claims "The List<T> class is the generic equivalent of the ArrayList class. It implements the IList<T> generic interface using an array whose size is dynamically increased as required." Wouldn't this imply that once compiled, the primary difference between List<> and array is that some methods in List<> need to check whether it's time to resize or not?
@mcl: Every access ends up going through a wrapper layer, basically - which doesn't occur with a "raw" array. That adds a small performance hit.
9

An array is not resizable. This means that when it is created one block of memory is allocated, large enough to hold as many elements as you specify.

A List on the other hand is implicitly resizable. Each time you Add an item, the framework may need to allocate more memory to hold the item you just added. This is an expensive operation, so we end up saying "List is slower than array".

Of course this is a very simplified explanation, but hopefully enough to paint the picture.

2 Comments

It is not quite true that "each time you add an item" you might need to resize the underlying array; in particular there are never many such resizes in a row. The List class is carefully designed so that you never get into a situation where multiple consecutive adds cause multiple consecutive resizes. The amortized number of resizes should be proportional to the log of the maximum size attained; the total number of items copied should be proportional to the maximum size.
@EricLippert: I know; that's why I put the "may" in "may need". You can surely tell that avoiding terms like "amortized" and "O(N)" was intentional given the level of the question and the aim of making the answer short and "close enough".
9

The article is from 2004, that means it's about .net 1.1 and there was no generics. Array vs collection performance actually was a problem back then because collection types caused a lot of exta boxing-unboxing operations. But since .net 2.0, where generics was introduced, difference in performance almost gone.

Comments

3

An array is the simplest form of collection, so it's faster than other collections. A List (and many other collections) actually uses an array internally to hold its items.

An array is of course also limited by its simplicity. Most notably you can't change the size of an array. If you want a dynamic collection you would use a List.

3 Comments

Array.Resize<T>() begs to differ on the issue of changing the size of an array. msdn.microsoft.com/en-us/library/bb348051.aspx
@eric: Uh yeah, I read it. The very first thing it says is: "Changes the number of elements of an array to the specified new size." Perhaps your coworkers should summarize the method differently. Or perhaps the difference you refer to is mostly semantic unless one is implementing the library or the compiler. Does List<T> not do approximately the same thing when the list size increases? If we are concerned about performance and have a structure that is changing size a lot, wouldn't we be better served by something besides List<T> and Array?
@mcl: Indeed, a list "resizes" the underlying array (by making a new array and copying). It does so using a double-when-full strategy, so the amortized cost of an Add is O(1) but the worst case cost is O(n), and the worst case memory wastage is around 100% of the list size. Basically, the list trades wasted memory for improved speed. (And I agree, the wording of that page leaves much to be desired. It is very confusing.)
3

List<string> is class with a private member that is a string[]. The MSDN documentation states this fact in several places. The List class is basically a wrapper class around an array that gives the array other functionality.

The answer of which is faster all depends on what you are trying to do with the list/array. For accessing and assigning values to elements, the array is probably negligibly faster since the List is an abstraction of the array (as Jon Skeet has said).

If you intend on having a data structure that grows over time (gets more and more elements), performance (ave. speed) wise the List will start to shine. That is because each time you resize an array to add another element it is an O(n) operation. When you add an element to a List (and the list is already at capacity) the list will double itself in size. I won't get into the nitty gritty details, but basically this means that increasing the size of a List is on average a O(log n) operation. Of course this has drawbacks too (you could have almost twice the amount of memory allocated as you really need if you only go a couple items past its last capacity).

Edit: I got a little mixed up in the paragraph above. As Eric has said below, the number of resizes for a List is O(log n), but the actual cost associated with resizing the array is amortized to O(1).

1 Comment

You are confusing two different things. Increasing the size of a double-when-full list is an amortized O(1) operation on each Add. Work it out; suppose you start with an array of size one and do 33 adds. You'll copy 1 item at 2, 2 at 3, 4 at 5, 8 at 9, 16 at 17 and 32 at 33. The number of such resizes is O(log n); the total number of items copied is 1 + 2 + 4 + 8 + 16 + 32 = 63, and you've done 33 adds for an amortized cost of just under 2 copies per add. That constant bound of 2 does not change no matter how big the list gets.