Archive for the ‘Coding’ Category

The Much Maligned Pointer

Wednesday, November 24th, 2010

GeneralDev.jpg

There are books about the subject. There are people that swear they are the Devil's handiwork. They are the source of much debate, and in my opinion, they are much maligned. They are pointers.

Pointers in C and C++ code are as important as any other language construct. They are an essential tool in the arsenal of a software developer, and they have to be mastered. Yes, if put in the wrong hands pointers can be a very dangerous thing. But so can knives, guns, and explosives. But that doesn't mean that you want to start digging out tons of rock with shovels because explosives are "dangerous". It means that you have to make sure you have skilled technicians that understand the way in which to safely handle the tools of the trade.

Same with pointers. You have to be skilled and trained in their use, but that doesn't mean you have to spend a decade learning how to use them. You just need to keep an eye on them and remember to have strict rules in their usage and lifecycle.

For example, if you're using pointers on a queue, make sure that your method comments indicate when you're taking ownership of the pointer, and when you're passing ownership to the caller. Then it's clear. Simple.

Pointers are really no different than any resource you have to track - pooled database connections, threads, all these offer the same level of difficulty as the lowly pointer. But all are essential tools of the skilled software developer.

Fear not the pointer. It is your friend.

Very Non-obvious Memory Leak (cont.)

Tuesday, November 23rd, 2010

bug.gif

Today I spent the nearly the entire day trying to find the remaining memory leak(s) on my ticker plant. Again, only a few were being effected and again, I focused on the code they exclusively have. Once again, this was a complete waste of time as it wasn't in the exclusive code but a very odd little bug in the shared code.

When I'd struck out after several hours of testing, I decided to start at one and and sweep the code with a fine-tooth comb. Starting at the incoming UDP feed, I cut the rest of the app off and checked the memory usage. Stable. Good. Now let's add in the next step. Leak? Ha... let's fix him. Continue until we have the leak spotted.

The first problem may not have been a leak, but it was an unreliable memory usage pattern. The UDP datagrams came in the boost asio socket and I placed the datagram into a std::string with the time as microseconds since epoch into a simple stl::pair. This was then placed into a simple std::deque. Something like this:

  typedef std::pair<uint64_t, std::string> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

where the CircularFIFO is a single-producer, single-consumer, lockless, circular FIFO buffer for fast pushes and pops of the data coming off the wire.

The problem with this design is that we have to create the std::string every time anyway, and the storage of this structure is very unpredictable. What I decided to do was to switch from the stack to the heap and change the structure:

  typedef std::pair<uint64_t, std::string *> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

Now it's basically two 64-bit ints and the heap will reclaim the memory as needed. This was a nice addition, but it wasn't the final problem.

It was at the end of the day, but thank goodness that I found it. It turns out that the ZeroMQ send() method was the culprit. Normally, the ZeroMQ has been very nice for me. Why these messages caused problems, I have no idea, but it's the one method and nothing else.

I know they are working on a new version (2.1) with the latest OpenPGM included, and that will be nice to see. Tomorrow morning when they are all online, I'll ask what the story is on the release of 2.1. Until then, I'll deal with the smaller, but still annoy leaks.

Whew!

[11/24] UPDATE: I talked to the ZeroMQ guys this morning and they say the release of 2.1 is scheduled for this week. Nice. I'll get it early next week and try it out.

Very Non-obvious Memory Leak

Monday, November 22nd, 2010

bug.gif

Today has been spent trying to track down a very non-obvious memory leak in my ticker plant code. I've been watching it run over the last few days - fixing little things as I see them, and each time the app runs longer and better. Good... we're moving in the right direction.

But it's odd that a few of my ticker plants have a problem with a growing memory footprint. Very odd indeed. So I started digging into these exchange feeds. My first mistake was to ignore all common functionality - after all, if it's common with the exchange feeds that aren't leaking, then it can't be that code. Right?

Wrong.

What I found was that I needed to be exceptionally careful even when using the compare-and-swap atomic operations. It's possible that two threads, on two CPUs, are doing their own thing on that one variable, and if it's only in their cache, it's possible that there might be a time when the caches are updated and the main memory isn't. This could cause me to "loose" a message, and leak memory.

What I did was put a simple boost spinlock mutex on the value and then we got a lot more stability. That's good.

Unfortunately, that's not the end of the story... but it's the end of the day. A long day for a little solution. Tomorrow I'll have to see what the remaining problems are.

Making the Broker Client Direct Dial Aware

Friday, November 19th, 2010

Ringmaster

Yesterday I worked on the Broker's C++ Service adapter to allow for these direct dial connections with the addition of the boost asio acceptor listening on a ephemeral port and putting that in the registration message to the Broker. That looked to be pretty good, and so today I worked on adding the client-side of the protocol.

I went with a simple scheme - if the user explicitly asks to locate() a service, then we'll ask the Broker for the direct dial location, and save it. Then, for every connection to that service, we'll have a pool of connections to use that connect directly to the service and do not go through the Broker.

There's a hitch - if the Broker is trying to load balance my calls, then I'm going to mess him up as I am creating a pool on the one and only location he's giving me. This isn't great, but the alternative is to not have any pooling of connections and ask the Broker for the location each time, and then create a connection and then throw it away.

It's possible, but I decided against it. I think there's more value in this pooling, but we'll see. Maybe I'm all wet and the real value is in asking the Broker each time. We'll have to see how it plays out in the usage patterns.

Google Chrome dev 9.0.587.0 is Out

Friday, November 19th, 2010

GoogleChrome.jpg

This morning I noticed that Google Chrome dev 9.0.587.0 was out, and the release notes point to quite a few nice fixes:

  • GPU Related Fixes
  • Crash Fixes
  • Instant Fixes
  • [r65953] Move click-to-play to about:flags. (Issue: 62091)

Not a lot of detail about what each of these mean, but I'm encouraged by the GPU fixes as it's possible we're getting closer to that elusive super-browser where things are really exceptionally fast. Here's hoping...

Adding New Broker Service Access Points

Thursday, November 18th, 2010

Ringmaster

I was talking with the developers of the Broker today and we realized that it was about time to add in a dial direct access mode for the services that register themselves with the Broker. Up to now, all services would register themselves with the Broker via a single socket connection, and then multiplex that socket with many channels, each identified by a unique 16-byte ID. This is all well and good, but should there be a problem on one of these channels that's non-recoverable, the entire socket will be dropped, and with it all the channels. Not ideal.

The solution is to have a service open up a ephemeral port, and then send that data to the Broker in the registration message so that the Broker can use that additional listener as a different way to establish connections to the service. Think of it as "dial direct, and save".

In boost asio, this is pretty simple. In the service handler code I added a listen() method so that calling this would create a new acceptor, bind it, and start it listening. Additionally, I'd pull out the ephemeral port from the acceptor so I could later send it to the Broker.

  bool MMDServiceHandler::listen()
  {
    bool    error = false;
 
    // first, let's make sure we have what we need... a Boss...
    if (!error && (mBoss == NULL)) {
      error = true;
      cLog.error("[listen] the MMDService is missing... please check on this.");
    }
 
    // create an acceptor if we don't already have one
    if (!error && (mAcceptor == NULL)) {
      mAcceptor = new boost::asio::ip::tcp::acceptor(mBoss->mIOService);
      if (mAcceptor == NULL) {
        error = true;
        cLog.error("[listen] unable to create acceptor - very bad news.");
      }
    }
 
    // if we're OK, then fire up the acceptor for listening - pretty basic
    if (!error) {
      using namespace boost::asio::ip;
      // make the 'endpoint' in boost terms - IPv4 and a random port
      tcp::endpoint ep(tcp::v4(), 0);
      // open up the acceptor and set SO_REUSEADDR to enabled
      mAcceptor->open(ep.protocol());
      mAcceptor->set_option(tcp::acceptor::reuse_address(true));
      // ...now bind the bad boy and start listening
      mAcceptor->bind(ep);
      mAcceptor->listen();
      // get the port so we can log what we've done
      tcp::endpoint lep = mAcceptor->local_endpoint();
      mPort = lep.port();
      // create a new MMDClientProxy to populate when we get hit
      MMDCLientProxy  *proxy = new MMDCLientProxy(this, mAcceptor->io_service());
      // now fire off the async accept on this guy
      mAcceptor->async_accept(proxy->getSocket(),
              boost::bind(&MMDServiceHandler::handleIncomingConnection, this,
                  boost::asio::placeholders::error, proxy));
    }
 
    return !error;
  }

The only real tricky point was that the endpoint I created with the port of zero wasn't modified by boost for the port number that was actually used. I needed to go into the acceptor, get it's local endpoint, and then from it, get the port. Took me a few minutes to figure that one out - not a lot of documentation on that point.

With this, and the existing code in the handler to handle the multiplexed channel connections, it wasn't too bad to make this new client proxy a fully functioning "portal" to the service. Pretty nice.

Now we need to update the Broker to vend this information, and then the client code has to be updated to read it and make the direct connection. Pretty neat stuff.

Sweet Little Monitoring Tool – dstat

Wednesday, November 17th, 2010

Linux

I was talking to a few folks today and mentioned that I wanted to be able to snapshot the CPU usage on my ticker plant box every few seconds during the day so I could make some intelligent sizing decisions as to what can live on one box, and they mentioned dstat.

This is one sweet little monitoring tool.

I've set up an alias to run the following:

  dstat -t -c -m -n --output dstat_data.csv 5

and on the console I'll get the lovely color output, but to the file dstat_data.csv I'll get a nice importable data set. Sweet!

This is why I love building and running code on linux - it's just so darn neat!

The Retry Loop – Painful to Get Right

Wednesday, November 17th, 2010

GeneralDev.jpg

I've been hardening my ticker plant code for the last several days, and today it's been all about logging more intermediate results and putting in retry loops. The problem I find with retry loops is that if you didn't plan for them in the first place, making it fit in nicely can be kind of tricky. If you plan on your method returning if it can't succeed, then your retry loop needs to be within that method completely.

But what if there's exceptions? Well... you have to try/catch around the code, and then if you get a continued failure, then you have to throw the last one? Maybe you throw a new exception with the last one contained within? Not obvious.

Then there's the really layered code where you're in the middle of a call stack, and you're trying to retry, but some of the methods you call already retry, and some don't. So do you "double retry" on some to cover the others? It's not at all easy to do.

Oh don't get me wrong... you can do it... it's just not easy.

I've been hammering away at a lot of these in my code this morning and I'm getting fewer and fewer instances of needing to get back in the code. It's getting close, that's for sure. It's just not easy.

But then, if it really were, everyone would be able to do it.

iTerm2 Alpha 13 is Out

Wednesday, November 17th, 2010

iTerm2

I noticed this morning that iTerm2 Alpha13 is out with a ton of really neat features. The searching, the playback - Holy Cow! These guys are taking the terminal to places I never even thought about. What an amazing piece of work. And it's slick. Very slick.

Excellent work.

Write… Compile… Run… What a Dream

Wednesday, November 17th, 2010

trophy.jpg

I was coding this morning - adding a new synchronous calling scheme to my existing asynchronous library, and really enjoying the capabilities of boost's asio library. I knew what I wanted to write, and I had the basics of the skeleton there for the asynchronous versions, but there were a few differences that arose simply because it was synchronous.

So I was writing this up, got done, compiled it and ran the test app to verify that I hadn't changed any of the behavior, and BINGO! It just worked.

Now, this wasn't cracking the atom - it was only about 100 lines of code, but still... the number of times I've been able to write, compile, and run code and have it just work is pretty rare. It was a very fun feeling to have. Just type it in and run it - it's pretty wild.

Good times...