Archive for September, 2011

Gave a Talk on Boost ASIO

Monday, September 12th, 2011

Boost C++ Libraries

I just got back from giving a talk on boost ASIO to a group of developers at The Shop. It was a "lunch talk" where the place buys lunch and they listen to a talk from someone - and today that was me. I was asked to present and give this talk by a developer that felt there were a lot of folks in his group that needed to understand that using a library for socket communications was a good thing, and not a bad one.

I spent a while making slides the other day, and today I ran through them. I finished at 12:59 - one minute short of the end time. Quite nicely done. I tried to stay to the theme that I was asked to talk to, and I think there may have actually been one or two folks that learned something. Maybe.

It's hard to try and teach while people are eating… and don't respond… Very hard indeed. But I got through it, and I'm glad it's done.

The Worst Bug I’ve Ever Written

Friday, September 9th, 2011

bug.gif

So we finally found the problem with the deadlock in the code… and it wasn't a deadlock at all. It was an infinite loop. Holy Cow! This was without a doubt the hardest bug I've ever had to find. And it was all my fault. Silly typo, and I spent days trying to find it. Thankfully, my co-worker was about to ask a few questions and get us on the right track, and we were able to find it, but it was nasty to find, and was a single character long.

Incredible.

In general, when doing atomic operations, when setting a value, you need to have a loop where you look at the existing value, try to CAS in the new one, and if you fail, try it all again. This typically looks like this:

  uint32_t      now = mValue;
  while (!__sync_bool_compare_and_swap(&mValue, now, aValue) {
    now = mValue;
  }

where aValue is the new value of the ivar mValue. And this will work. But if you have a typo in the code, say like this:

  uint32_t      now = mValue;
  while (!__sync_bool_compare_and_swap(&mValue, now, aValue) {
    now = aValue;
  }

then the first failure will put you in an infinite loop until someone happens to set it with the new value you're trying to set. It was a disaster.

When I changed the a to m, we were pulling in the correct values, and things were working just fine. Also, single-threaded tests that I had would not have seen this as they would not have failed in the first place. It requires two threads hitting the same atomic at the same time with different values. Amazing.

There… I feel worlds better because we found it. I know the code is better, and I know why. I know we won't have these same issues, and I know why. What a relief.

Intel’s Threaded Building Blocks – Nice Open Source Library

Friday, September 9th, 2011

Building Great Code

For a little while now - starting really in the quiet times, I've been using Intel's TBB Library, and I have to say, it's pretty impressive. There are several things in the library, but the biggies of note are the concurrent vector, the concurrent hash map, and the read/write spinlock. I've used these in my work over the last month or two with great success.

When I initially started looking at the TBB, I seem to remember building and using this was a pain. But maybe it's changed, or maybe it's a bad first impression. In either case, a co-worker decided to get the code in-house and installed on a few boxes, and it's as valuable for it's target as boost is for it's. There are a lot of things in this library, but with 4.0 announced this past week, they've added the Memory Pool (we're using TCMalloc from Google, but have heard there are better out there), full GCC Atomics Support (as opposed to the gcc-specific semantics), and new unordered sets and priority queue. Very nice stuff indeed!

Last evening, I took out a boost::unordered_map and boost::detail::spinlock and replaced them with a single tab::concurrent_hash_map. The difference in code wasn't all that great, but I have to say their Accessor is a bit odd, and possibly far easier replaced with an iterator of some sort. However, I can understand why they keep the two different.

For example, when I wanted to add a key/value to a boost::unordered_map, I'd simply do something like this:

  boost::unordered_map<std::string, uint32_t>    names;
  …
  names["joe"] = 32;

but that's not possible in TBB's concurrent_hash_map. Instead, you need to do something like the following:

  typedef tab::concurrent_hash_map<std::string, uint32_t> ages_t;
  ages_t      names;
  …
  ages_t::accessor  a;
  if (!names.find(a, "joe")) {
    names.insert(a, "joe");
  }
  a->second = 32;

The find() is used to set the ages_t::accessor to the right pair, and lock it for updating. If it returns false, then there is no pair, and we need to insert() one in the map. Again, this will lock that pair for updating. Either way, the accessor is now pointing to the pair, and has it locked. From here, it's just a matter of setting the 'value' part of the pair.

Again, not really simple, but not terribly hard, either. The advantage is amazing. Inserts are not as fast, obviously, but the lookups are very quick and we don't have to worry about using a read/write lock on this map, or a simple mutex, to keep things from getting scrambled.

There are lots of decent tools out there, but a lot more really crummy ones. I'm glad we found TBB and boost. They are going to make things a lot easier to work with.

Hammering on Threading Problems is Tedious Work

Wednesday, September 7th, 2011

bug.gif

Today I'm dealing with a lot of issues regarding performance and threading. It's non-trivial, even for as much of it as I've done, as it's all a balance of safety, speed, and need. It's just plain not easy, but hey… if it were, I wouldn't get paid to do it.

Today I ran across an interesting issue… found this code:

  const calc::results_t & Option::getResults() const
  {
    using namespace boost::detail;
    spinlock::scoped_lock  lock((spinlock &)mResultsMutex);
    return mResults;
  }

Now I know that the intended purpose of this code was to make sure that no one altered the mResults value while a copy was made on the calling stack and returned to the caller. But that's not at all what's happening, is it?

Have a look at the return type - a reference. That means that we're passing back a reference to the mResults value, and not a copy at all. This means that the scoped lock is totally useless in this context. Better to just remove it as there are places in the code that expect to get a reference:

  const calc::results_t & Option::getResults() const
  {
    return mResults;
  }

This is far cleaner, and just as "protected" from multiple thread access - which is to say it isn't.

I'm cleaning up things and trying to track down why I'm getting a deadlock, but it's not at all simple. It's not as easy as looking at the code - if it were, I'd have solved it, but some things are just too well hidden to pop out on a simple inspection. Sometimes, it takes a smoking gun to really point out the problem you have.

So I'm looking for the smoking gun. Hope I find it soon.

Nasty Locking Issue with Boost Scoped Spin Locks

Tuesday, September 6th, 2011

Boost C++ Libraries

I'm typically a huge fan of the boost libraries. They have saved me a ton of time in the last year, and they are about as rock-solid as I've seen public domain software be. But today I've been fighting a threading issue that's really been vexing. I'm not 100% positive this afternoon, but the evidence is certainly pointing to a problem with the use of the boost::detail::spinlock::scoped_lock and using it in very tight loop situations like the following:

  size_t MessageSource::getCountOfListeners()
  {
    boost::detail::spinlock::scoped_lock     lock(mMutex);
    return mListeners.size();
  }

where mListeners is just a boost::unordered_set containing pointers of some kind. The problem seems to be coming from the use of this as an argument in a function like the following:

  cLog.info("[process] %d kids", getCountOfListeners());

Now I've clearly simplified things a bit, but I'm guessing that I'd have had a lot better luck had I done the following:

  size_t MessageSource::getCountOfListeners()
  {
    mMutex.lock();
    size_t   sz = mListeners.size();
    mMutex.unlock();
    return sz;
  }

where I'm explicitly locking, getting the value, and unlocking the mutex. But there were a lot of places in the code where this was occurring, and rather than go that route, I chose to try the Intel TBB concurrent_hash_map where the key is the pointer and the value is just a dummy uint8_t. This compiles and runs just fine, with the added benefit that I was able to remove all the locks:

  size_t MessageSource::getCountOfListeners()
  {
    return mListeners.size();
  }

In this implementation, the hash map handles the size for me, and I don't have to worry about the locking or the scoped lock's lifetime. I believe this is going to make a huge difference tomorrow, but we'll have to wait and see. It's certainly in the area of the problem.

Upgraded to Mac OS X 10.7 Lion

Tuesday, September 6th, 2011

Mac OS X Lion

A few weeks ago, yeah, I know, I've been busy, I got Lion (10.7.1) from the Mac App Store and installed it on my main MacBook Pro. The upgrade took longer than I really expected - the downloading was not fast at all. But I will say, it was smooth. Very smooth.

The big issues I found with Mac OS X Lion is that Colloquy 2.3 wasn't really working properly. Thankfully, all the other apps that I depend on day-to-day were working fine. I still have the problem that Twitterrific does not work when the video card is changed and it tries to "pop up", but hey, it's a small price to pay, and it's not just Twitterrific - it's all the pop-up Twitter clients I've tried. Kind of disappointing.

Anyway, Colloquy had a new build that does support Lion, and it's available from the Colloquy Downloads Folder. Go there, get the latest, or at least 2.4, and you're in business. Kind of surprised that they aren't more responsive as Lion has been out there for a while, but at least they have something that works for me.

Other than that, Lion is working just fine and the look and feel of the OS is very nice. I especially like the vanishing scroll bars. On Terminal.app, I've been asking for those for ages, and I now have them. Good. Fantastic.

OK, It’s Time to Start Setting Priorities

Tuesday, September 6th, 2011

I know it's been ages since I've really posted much here, and I know the exact reason - I'm working like a dog. I have plenty to write about, but I simply have no time. Well, I need to correct that situation. I really do. I need to make time, and make time to get back to what I was doing - writing about the issues I was facing on a day-to-day basis.

So I'm going to try. Try harder, that is.