Making the Case for Multi-Threaded Applications
I have listened to a lot of people say that multi-threaded programming is dangerous, and can be avoided with careful planning. I can see their argument: a properly created finite-state machine with sufficiently small processing chunks can appear to be processing many things at once on a single thread, but that's an illusion. A bad one at that.
First, the effort to write good multi-threaded code is simpler than making the equivalent finite-state machine code. I know, I've done both. You can do the finite-state machine, and it's great on old hardware, or single-core machines, but it's not necessary. Most machines now are multi-core, and getting more and more with each generation. No. You need to know how to write multi-threaded code in today's high performance systems. No two ways about it.
Second, there are times when the overhead of 'polling' in the finite-state machine construct overwhelms the processing that really needs to get done. There's a reason we went to interrupt-driven I/O over polled I/O - one is much more efficient than the other - but harder to code. Suck it up, Boys! Code like you're worth the money they are paying you! Write the code that should be written - not the easy stuff - the hard stuff.
I've been working with a package that has a central object repository, and caches in each API instance. This is smart in that it allows for the client to grab something, and then hold on to it for many requests so that there's not a continual re-fetching if the data doesn't change. Nice. Problem is - they made the API cache single-threaded. This means that in order for a client to get any updates, it has to ask for them.
Normally, this isn't a horrible problem, but if the asking requires a re-fetching (as opposed to getting just the updates), then the more things there are in the cache, the worse it is on performance. Remember - this is single-threaded, so they aren't going to let you do anything until the re-fetching is done.
What they should have done is to make the cache update in a background thread with a simple "complete replacement" when the update arrives. Keep the bulk of the API single-threaded, but as updates flow into the API's cache, allow them to update a hidden copy of the object, and then lock the API thread, swap out the objects, and unlock the thread. This way, the cache stays consistent with the data in the central object repository.
Why is this important?
Because let's say I get an object and modify it. I've got a new copy, and I save it to the central object repository. Now another user does a similar action. They get the copy, modify it, and now my copy is stale. This is bad, because if I don't ask for updates, I may try to edit it more, and save those changes. Well... that's not going to do because my 'before' copy is not what the central object repository has. This means it's going to throw an exception.
But what about my edits? They have to go. No choice. I have to re-fetch, and then start over again.
Without even the simple flag like "re-fetch me", which could easily be added to the API under the covers, I have no idea that my work will be in vain. But it will.
This is the case for multi-threading that I'm making. There are times that you need to simplify the problem into multiple threads and have one dedicated to the simple task of making sure the cache is consistent. It's simple, surgical, and not hard to do. The results would be dramatic in the simplification of the client code.
So learn how to write multi-threaded code. Many times, it's the simplest way to get what you need done. And in the era of 60-core boxes, you're going to need it to get the most out of any box.