The Worst Bug I’ve Ever Written
So we finally found the problem with the deadlock in the code… and it wasn't a deadlock at all. It was an infinite loop. Holy Cow! This was without a doubt the hardest bug I've ever had to find. And it was all my fault. Silly typo, and I spent days trying to find it. Thankfully, my co-worker was about to ask a few questions and get us on the right track, and we were able to find it, but it was nasty to find, and was a single character long.
Incredible.
In general, when doing atomic operations, when setting a value, you need to have a loop where you look at the existing value, try to CAS in the new one, and if you fail, try it all again. This typically looks like this:
uint32_t now = mValue; while (!__sync_bool_compare_and_swap(&mValue, now, aValue) { now = mValue; }
where aValue is the new value of the ivar mValue. And this will work. But if you have a typo in the code, say like this:
uint32_t now = mValue; while (!__sync_bool_compare_and_swap(&mValue, now, aValue) { now = aValue; }
then the first failure will put you in an infinite loop until someone happens to set it with the new value you're trying to set. It was a disaster.
When I changed the a to m, we were pulling in the correct values, and things were working just fine. Also, single-threaded tests that I had would not have seen this as they would not have failed in the first place. It requires two threads hitting the same atomic at the same time with different values. Amazing.
There… I feel worlds better because we found it. I know the code is better, and I know why. I know we won't have these same issues, and I know why. What a relief.