One day, I was thinking about NSArray enumeration (also called iteration): since Mac OS X 10.6 and iOS 4, there's the wonderful new world of blocks, and with it came enumerateObjectsUsingBlock:
. I assumed it must be slower than fast enumeration (for (object in array) { ... }
) due to overhead, but I didn't know for sure and thus decided to actually measure the performance.
Which kinds of enumeration do exist?
Basically, we have four kinds of enumeration available (see also Mike Ash's Friday Q&A 2010-04-09: Comparison of Objective-C Enumeration Techniques).
-
objectAtIndex: enumeration
Using a
for
loop which increases an integer and querying the object using[myArray objectAtIndex:index]
is the most basic form of enumeration.NSUInteger count = [myArray count]; for (NSUInteger index = 0; index < count ; index++) { [self doSomethingWith:[myArray objectAtIndex:index]]; }
-
NSEnumerator
This is a form of external iteration:
[myArray objectEnumerator]
returns an object. This object has a methodnextObject
that we can call in a loop until it returnsnil
.NSEnumerator *enumerator = [myArray objectEnumerator]; id object; while (object = [enumerator nextObject]) { [self doSomethingWith:object]; }
-
NSFastEnumerator
The idea behind fast enumeration is to use fast C array access to optimize iteration. Not only is it supposed to be faster than traditional
NSEnumerator
, but Objective-C 2.0 also provides a very concise syntax.id object; for (object in myArray) { [self doSomethingWith:object]; }
-
Block enumeration
Available since the introduction of blocks, this allows to iterate an array with blocks. Its syntax isn't as nice as fast enumeration, but there is one very interesting feature: concurrent enumeration. If enumeration order is not important and the jobs can be done in parallel without locking, this can provide a considerable speedup on a multi-core system. More about that in the concurrent enumeration section.
[myArray enumerateObjectsUsingBlock:^(id object, NSUInteger index, BOOL *stop) { [self doSomethingWith:object]; }]; [myArray enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id obj, NSUInteger idx, BOOL *stop) { [self doSomethingWith:object]; }];
First, let's discuss linear enumeration: one item after the other.
Graphs
Conclusions
Somewhat surprisingly, NSEnumerator
is even slower than using objectAtIndex:
. This is true for both Mac OS X and iOS. I suspect this is due to the enumerator checking whether the array was modified with each iteration. Somewhat unsurprisingly, fast enumeration holds up to its name and is the fastest solution.
For small arrays, block enumeration is a bit slower than objectAtIndex:
, but with more elements in the array the performance becomes almost as fast as fast enumeration.
The difference between fast enumeration and NSEnumeration
is already quite big with a million entries: on the iPhone 4S, the former took about 0.037 seconds while the later took about 0.140 seconds. That's already a factor of 3.7.
An oddity
The very first time an NSArray
is allocated in a program and the very first time an enumerator is requested with objectEnumerator
it takes an unusual long time to complete. For example, to allocate an array with one element the median was 415 nanoseconds on my 2007 MacBook Pro 17". But the very first time an array was allocated it take more than 500,000 nanoseconds, sometimes even up to 1,000,000 nanoseconds! The same with querying the enumerator: instead of the median of 673 nanoseconds it took also took more than 500,000 nanoseconds.
I can only guess about the reason, but I suspect lazy loading is to blame. You probably won't notice this in a real application because by the time your code is running Cocoa or Cocoa Touch probably already has created an array.
Concurrent enumerationBlock enumeration offers the option to enumerate the objects concurrently, if possible. This means the work load is potentially spread over several CPU cores. Not every operation to be done while enumerating can be parallelized, so concurrent enumeration only makes sense if no locking is required within the block: either because each operation really is absolutely independent or because atomic operations can be used (like OSAtomicAdd32
and friends).
So, how well does it perform compared to the other enumeration methods?
The graphs
Conclusions
For small number of elements, the concurrent enumeration is by far the slowest method. This probably has to do with the additional work that needs to be done preparing the array for concurrent access and starting the threads (I don't know whether GCD or "traditional" threading is used and it doesn't matter; it's an implementation detail we shouldn't need to care about).
However, if the array is large enough all of a sudden concurrent enumeration becomes the fastest method, just as expected. On the iPhone 4S and a million entries, it took about 0.024 seconds to enumerate with concurrent enumeration, but 0.036 seconds with fast enumeration. By contrast, NSEnumeration
took 0.139 seconds for the same array! That's already a pretty big difference, a factor of 5.7.
On my office 2011 iMac 24" with a Core i7 quad-core CPU, the million entries where enumerated concurrently in 0.0016 seconds. Fast enumeration of the same array took 0.0044 seconds and NSEnumeration
0.0093 seconds. That's a factor of 5.8 which is pretty close to the iPhone 4S results. I was expecting a bigger difference here, though. On my 2007 MacBook Pro with Core2 Duo dual-core CPU, the factor was "just" 3.7 here.
The threshold when concurrent enumeration became useful was somewhere between 10,000 and 50,000 elements in my tests. With less elements, just go with normal block iteration.
AllocationI also wanted to know whether the performance is any different depending on how the array was created. I tested two different methods:
- Create a C array which references the object instances and create the array using
initWithObjects:count:
. - Create a
NSMutableArray
and subsequently add objects usingaddObject:
.
There's no difference when iterating, but there is a difference when allocating: the initWithObjects:count:
method is faster. With a very large number of objects, this difference can become significant. Here's an example on how to create an array with NSNumber
s:
NSArray *generateArrayMalloc(NSUInteger numEntries) {
id *entries;
NSArray *result;
entries = malloc(sizeof(id) * numEntries);
for (NSUInteger i = 0; i < numEntries; i++) {
entries[i] = [NSNumber numberWithUnsignedInt:i];
}
result = [NSArray arrayWithObjects:entries count:numEntries];
free(entries);
return result;
}
How did I measure?
You can download the test application to see how I've measured. Basically I'm measuring how long it takes to iterate an array without actually doing anything else, and repeat that 1000 times. For the graph, the median for each array size was taken. Compilation was done with optimization turned off (-O0
). For iOS, testing was done with an iPhone 4S. For Mac OS X, I used my 2007 MacBook Pro 17" and my office 2011 iMac 24". The graphs for Mac OS X show the results of the iMac, but on the MacBook Pro the graphs looked similar, it just was slower.