libdispatch, the technology behind Apple’s much-touted Grand Central Dispatch in OS X 10.6, has been ported to FreeBSD and is planned to be included by default in FreeBSD 8.1. This is Good News™ because widespread adoption of this library means a big win for system- and application-level responsiveness across the board. No more beach balls or hour glasses!
The reason behind the big push for GCD and libdispatch is to shift the responsibility for managing threads and their assorted overhead, execution and pitfalls from applications to the operating system. Though reduced control makes a seasoned developer scoff and dismiss new frameworks as scaffolding for greenhorns, a pervasive approach to multi-tasking is increasingly essential in this age of multi-core processors as even the best-written applications do not have full insight into everything else happening on the system and hence cannot deliver the best possible performance.
Why is a pervasive approach necessary, you might ask? Let’s say that your application has a task that can be sub-divided into half a dozen independent units of work. If this program spawns three or four threads on an eight-core machine (not uncommon with the popularity of the Intel i7 platform, let alone the 16 cores available on the high-end Mac Pro!), is this too few or too many threads? The correct answer is that there is no answer: it depends on what else is happening on the system!
If seven of those eight cores are saturated transcoding your home movies to H.264 for perusal on your Apple TV, your three or four threads will only have one core to fight over amongst themselves and your performance will grind to a halt on account of context-switching and resource management. Does this mean we should only spawn as many threads as there are cores available? We might naïvely say yes, but what if that movie finishes transcoding? We’ll then have seven idle cores and be using only a small fraction of the computational power available to us.
We can now say with reasonable authority that the optimal number of threads to spawn can only be decided with any degree of accuracy by a globally-aware entity, also known as GCD. It will spawn as many or as few threads as are required to complete enqueued tasks. Not only will this allow us to take advantage of thread-pooling (a feature to be expected in any modern OS), but we can avoid the overhead of managing our own threads. On OS X, a thread requires over 512 KiB of overhead just to manage its state. Create 100 threads and you’ve got yourself 50 MiB of data doing very little for you. Compare this baggage to that of GCD’s queues, which make claim to a meagre 256 bytes of overhead. On account of this relative featherweight, developers are encouraged to create as many queues as they need. If your application would use 150 simultaneous tasks, create 150 queues. Not only will GCD appropriately dole this out to the relevant number of cores, it will not require further development to take advantage of the 32- or 64-core processors we might see down the track.
This all sounds very rigid and worse yet; a nightmare to keep track of. Spawning a thread is not technically difficult (quite the contrary), but it’s a pain in the proverbial behind as we get distracted from our exciting application logic by god-awful task-management, mutexes, locks and callbacks. Often we just won’t bother if there’s a task (say, writing a file to disk or looking up the address of a hostname) that might only usually take a few seconds. But what if the file write blocks, or the network stalls? We’re caught off-guard (as these circumstances are hard to reproduce, let alone emulate) and the application locks up for an indeterminate period.
GCD tells us it doesn’t have to be like this. Say we have an application that counts the number of times the word “fruitbowl” appears in a given document. Typically, we don’t work with gargantuan recipes, so this procedure should execute well within the blink of an eye. It’d be silly to spawn a thread (and risk making a mistake) for something so trivial. We could just use code along the following lines:
- (IBAction)countFruitBowls:(id)sender
{
NSNumber *numberOfFruitBowls = [document reasonablyInvolvedFruitbowlCountingMethod];
[documentModel setNumber:numberOfFruitBowls];
[documentView setNeedsDisplay:YES];
[numberofFruitBowls release];
}
Keep in mind that if this code is connected to the push of a button in the UI, it will run on the main thread. Also keep in mind that if the user does open a gargantuan recipe (or worse yet; a cookbook!), this procedure might take much longer than an eye-blink: say, 15 seconds? Of course, this is an atypical situation and do we really want to add all the extra thread-handling and callback code to this otherwise simple routine? We’d make analysis of the code and bug-hunting a nightmare as the application logic would be drowning in thread-plumbing.
What if I was to tell you that GCD allowed us to shuffle the fruitbowl-counting business into the background by adding just two lines of code to the existing function? No global variables, no context objects, no mutexes and no callbacks. Sounds like a deal? Alright then, feast your eyes:
- (IBAction)countFruitBowls:(id)sender
{
dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSNumber *numberOfFruitBowls = [document reasonablyInvolvedFruitbowlCountingMethod];
dispatch_async(dispatch_get_main_queue(), ^{
[documentModel setNumber:numberOfFruitBowls];
[documentView setNeedsDisplay:YES];
[numberofFruitBowls release];
});
});
}
There’s a lot of functionality packed into those two lines of code, but we don’t care! That’s not true, we’re developers and this stuff gets us randy, but we don’t want to have to micromanage this functionality every single time. Up until now, we’ve been talking in very abstract and vague terms like “units of work”, without specifying how these are implemented. The give-away is in the second argument to the two dispatch_async() calls: blocks. (You did read the links above, didn’t you? If not, do your homework at Wikipedia and get back here.) Blocks are very much like closures in higher-level languages in that they can capture the surrounding state (or at least copies of the appropriate stack variables), which allows us to avoid painful refactoring and other manipulations of our code to fit around the threading API.
That said, the best part of this code is yet to be discussed: how it deals with detecting the completion of the background code and then serving up the result to the main thread. In the above synchronous code, the analysis is performed and the application’s UI is updated in the desired order. Magically, this is still the case in the asynchronous code. This is achieved by by the outer dispatch_async() call putting a task on a global GCD queue. That task is represented by the block passed as the second argument. That block itself has another call to dispatch_async(), albeit one that puts a task on the main — serial — queue, to update the UI. This means that there is no necessity for special-purpose notifications to the main thread, the run loop will process these blocks on the main queue just as any other input.
Surprisingly (or should that be amazingly?), it is just as easy to modify a serial implementation of many independent tasks and parallelise it. That’s right, a parallel for-loop. Compare this:
for (i = 0; i < n; i++) {
result[i] = analyse(dataset, i);
}
final = summary(result, n);
to this:
dispatch_apply(n, dispatch_get_global_queue(0, 0), ^(int i) {
result[i] = analyse(dataset, i);
});
final = summary(result, n);
Nice, isn’t it? No additional variables, no need to stress about the appropriate number of threads and no extra work checking to see if all the tasks have completed. That said, observant readers will note the addition of a parameter for the above block: blocks can function (’scuse the pun) just like function pointers. We can’t use “i++” in place of the parameter, as stack variables local to the block enclosure are captured as consts upon creation of the block. If this sounds too limiting, Apple have got you covered there too, with the addition of the __block storage type modifier.
In case you’re wondering about the title of this article, one of Apple’s slogans for Grand Central Dispatch is “islands of serialization (sic) in a sea of concurrency.” It quite nicely sums up the reality of making concurrency more accessible to typical desktop applications. The serialisation effectively protects the developer from dealing with race conditions, deadlocks and abandoned mutexes. Likewise, thanks to the ability to provide anonymous, in-line functions in the form of the blocks C extension (currently only supported by the fantastic Clang compiler), it is easier to identify units of work to execute off the main thread, even if they involve several sequential, interdependent tasks.
That does a great job of capturing the practical reality of adding more concurrency to run-of-the-mill desktop applications. Those islands are what isolate developers from the thorny problems of simultaneous data access, deadlock, and other pitfalls of multithreading. Developers are encouraged to identify functions of their applications that would be better executed off the main thread, even if they’re made up of several sequential or otherwise partially interdependent tasks. GCD makes it easy to break off the entire unit of work while maintaining the existing order and dependencies between subtasks.
As a developer, I’m extremely excited by Grand Central Dispatch. I’m also excited to see it embraced by FreeBSD, which I hope is a precursor to Linux support. I think asking for Windows support is a bit too much at this stage.
As a user, you should be excited, too. Responsive, performant applications are the future. It’s about time developers started making use of the multiple cores available to us and GCD will make this easier than ever.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
You must be logged in to post a comment.