NSArray enumeration performance examined

One day, I was thinking about NSArray enumeration (also called iteration): since Mac OS X 10.6 and iOS 4, there's the wonderful new world of blocks, and with it came enumerateObjectsUsingBlock:. I assumed it must be slower than fast enumeration (for (object in array) { ... }) due to overhead, but I didn't know for sure and thus decided to actually measure the performance.

Which kinds of enumeration do exist?
Linear enumeration
Concurrent enumeration
- The graphs
- Conclusions
Allocation
How did I measure?

Which kinds of enumeration do exist?

Basically, we have four kinds of enumeration available (see also Mike Ash's Friday Q&A 2010-04-09: Comparison of Objective-C Enumeration Techniques).

objectAtIndex: enumeration Using a for loop which increases an integer and querying the object using [myArray objectAtIndex:index] is the most basic form of enumeration.
```
NSUInteger count = [myArray count];
for (NSUInteger index = 0; index < count ; index++) {
    [self doSomethingWith:[myArray objectAtIndex:index]];
}
```
NSEnumerator This is a form of external iteration: [myArray objectEnumerator] returns an object. This object has a method nextObject that we can call in a loop until it returns nil.
```
NSEnumerator *enumerator = [myArray objectEnumerator];
id object;
while (object = [enumerator nextObject]) {
    [self doSomethingWith:object];
}
```
NSFastEnumerator The idea behind fast enumeration is to use fast C array access to optimize iteration. Not only is it supposed to be faster than traditional NSEnumerator, but Objective-C 2.0 also provides a very concise syntax.
```
id object;
for (object in myArray) {
    [self doSomethingWith:object];
}
```
Block enumeration Available since the introduction of blocks, this allows to iterate an array with blocks. Its syntax isn't as nice as fast enumeration, but there is one very interesting feature: concurrent enumeration. If enumeration order is not important and the jobs can be done in parallel without locking, this can provide a considerable speedup on a multi-core system. More about that in the concurrent enumeration section.
```
[myArray enumerateObjectsUsingBlock:^(id object, NSUInteger index, BOOL *stop) {
    [self doSomethingWith:object];
}];
[myArray enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id    obj, NSUInteger idx, BOOL *stop) {
    [self doSomethingWith:object];
}];
```

Linear enumeration

First, let's discuss linear enumeration: one item after the other.

Graphs

NSArray enumeration performance, Mac OS X, logarithmic scale

NSArray enumeration performance, Mac OS X, linear scale

NSArray enumeration performance, iOS, logarithmic scale

NSArray enumeration performance, iOS, linear scale

Conclusions

Somewhat surprisingly, NSEnumerator is even slower than using objectAtIndex:. This is true for both Mac OS X and iOS. I suspect this is due to the enumerator checking whether the array was modified with each iteration. Somewhat unsurprisingly, fast enumeration holds up to its name and is the fastest solution.

For small arrays, block enumeration is a bit slower than objectAtIndex:, but with more elements in the array the performance becomes almost as fast as fast enumeration.

The difference between fast enumeration and NSEnumeration is already quite big with a million entries: on the iPhone 4S, the former took about 0.037 seconds while the later took about 0.140 seconds. That's already a factor of 3.7.

An oddity

The very first time an NSArray is allocated in a program and the very first time an enumerator is requested with objectEnumerator it takes an unusual long time to complete. For example, to allocate an array with one element the median was 415 nanoseconds on my 2007 MacBook Pro 17". But the very first time an array was allocated it take more than 500,000 nanoseconds, sometimes even up to 1,000,000 nanoseconds! The same with querying the enumerator: instead of the median of 673 nanoseconds it took also took more than 500,000 nanoseconds.

I can only guess about the reason, but I suspect lazy loading is to blame. You probably won't notice this in a real application because by the time your code is running Cocoa or Cocoa Touch probably already has created an array.

Concurrent enumeration

Block enumeration offers the option to enumerate the objects concurrently, if possible. This means the work load is potentially spread over several CPU cores. Not every operation to be done while enumerating can be parallelized, so concurrent enumeration only makes sense if no locking is required within the block: either because each operation really is absolutely independent or because atomic operations can be used (like OSAtomicAdd32 and friends).

So, how well does it perform compared to the other enumeration methods?

The graphs

NSArray enumeration performance with concurrency, Mac OS X, logarithmic scale

NSArray enumeration performance with concurrency, Mac OS X, linear scale

NSArray enumeration performance with concurrency, iOS, logarithmic scale

NSArray enumeration performance with concurrency, iOS, linear scale

Conclusions

For small number of elements, the concurrent enumeration is by far the slowest method. This probably has to do with the additional work that needs to be done preparing the array for concurrent access and starting the threads (I don't know whether GCD or "traditional" threading is used and it doesn't matter; it's an implementation detail we shouldn't need to care about).

However, if the array is large enough all of a sudden concurrent enumeration becomes the fastest method, just as expected. On the iPhone 4S and a million entries, it took about 0.024 seconds to enumerate with concurrent enumeration, but 0.036 seconds with fast enumeration. By contrast, NSEnumeration took 0.139 seconds for the same array! That's already a pretty big difference, a factor of 5.7.

On my office 2011 iMac 24" with a Core i7 quad-core CPU, the million entries where enumerated concurrently in 0.0016 seconds. Fast enumeration of the same array took 0.0044 seconds and NSEnumeration 0.0093 seconds. That's a factor of 5.8 which is pretty close to the iPhone 4S results. I was expecting a bigger difference here, though. On my 2007 MacBook Pro with Core2 Duo dual-core CPU, the factor was "just" 3.7 here.

The threshold when concurrent enumeration became useful was somewhere between 10,000 and 50,000 elements in my tests. With less elements, just go with normal block iteration.

Allocation

I also wanted to know whether the performance is any different depending on how the array was created. I tested two different methods:

Create a C array which references the object instances and create the array using initWithObjects:count:.
Create a NSMutableArray and subsequently add objects using addObject:.

There's no difference when iterating, but there is a difference when allocating: the initWithObjects:count: method is faster. With a very large number of objects, this difference can become significant. Here's an example on how to create an array with NSNumbers:

NSArray *generateArrayMalloc(NSUInteger numEntries) {
	id *entries;
	NSArray *result;
	
	entries = malloc(sizeof(id) * numEntries);
	for (NSUInteger i = 0; i < numEntries; i++) {
		entries[i] = [NSNumber numberWithUnsignedInt:i];
	}
    
	result = [NSArray arrayWithObjects:entries count:numEntries];
    
	free(entries);
	return result;
}

NSArray allocation performance, Mac OS X, logarithmic scale

NSArray allocation performance, Mac OS X, linear scale

NSArray allocation performance, iOS, logarithmic scale

NSArray allocation performance, iOS, linear scale

How did I measure?

You can download the test application to see how I've measured. Basically I'm measuring how long it takes to iterate an array without actually doing anything else, and repeat that 1000 times. For the graph, the median for each array size was taken. Compilation was done with optimization turned off (-O0). For iOS, testing was done with an iPhone 4S. For Mac OS X, I used my 2007 MacBook Pro 17" and my office 2011 iMac 24". The graphs for Mac OS X show the results of the iMac, but on the MacBook Pro the graphs looked similar, it just was slower.

arraytest.m8.88 KB