What's it all about, Alfie?

Archive for the ‘Coding’ Category

SNMP vs. IRC – Complexity Over Simplicity

Monday, November 29th, 2010

In the past, I've used IRC in my applications to great utility. I created a simple framework that allowed me to have each application instance "pose as" a "user" on IRC, and when the application was running, you could see this in the chat rooms, and you could interact with it by as complex, or as simple, a means as you, the application designed, wanted. The protocol is simple, it's fast, and there's very little administration to the system.

SNMP is not so simple, but it's far more common in the monitoring and control of applications. The question is really: Is the complication worth it? I'm not at all certain that it is.

Several months ago, the decision was made to use Jabber as opposed to IRC. I went along with it as that's the right thing to do. But it's not the choice I'd have made. The ircd is simple, fast, and with a little additional code, you can make it log everything. From that point, you don't have any concerns about compliance, and you're free to use it as needed.

It seems that the powers that be are re-evaluating the Jabber solution as it's a little more complex in it's implementation, and the goal here is not to have something really complex, it's to have something really easy. So I'm hoping that I'll get to finish my IRC client based on the boost asio work. If so, it should be exceptionally fast and easy to use. Both are the hallmarks of a great utility.

I hope I get to see it come to pass.

The complexity of SNMP just seems like massive overkill.

Posted in Coding, Cube Life | Comments Off on SNMP vs. IRC – Complexity Over Simplicity

Tricking a Tricky Threading Problem

Wednesday, November 24th, 2010

This afternoon I've been tracking down a good solution to a nasty threading problem. This part of my ticker plants is the UDP receiver and it tries to get the UDP datagrams off the socket and into a buffer as fast as possible. To that end, I've got a single-consumer, single-producer, lockless FIFO queue that should be thread-safe as the 'head' and 'tail' are volatile and there's only one thread messing with one of these guys at a time.

But that's just the theory. Here's what the code looks like:

  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::push( Element & item )
  {
    uint32_t  nextTail = increment(tail);
    if (nextTail != head) {
      array[tail] = item;
      tail = nextTail;
      return true;
    }
 
    // queue was full
    return false;
  }
 
 
  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::pop( Element & item )
  {
    if (head == tail) {
      // empty queue
      return false;
    }
 
    item = array[head];
    head = increment(head);
    return true;
  }

Here's what happens: I'll be running just fine and then the call to pop() will return true and because of that, the value (a pointer) will return as something. This presents a real problem. If it returns a NULL, that's easy to deal with. Problem happens when it returns junk.

Ideally, it wouldn't return a NULL or junk, but coding for that has turned out to be harder than I thought. First, I can just check for a NULL or what I think of as "junk" data, and not delete that pointer, but what happens when it returns "junk" that's not fitting my pattern of "junk"? Well... I'll delete it and BAM! SegFault.

Not easy.

I believe the problem is one of compiler preference. The data in the class is defined as:

  volatile uint32_t head;
  volatile uint32_t tail;
  Element   array[Capacity];

where the lack of the volatile keyword is the big deal here. What I need to do is to make the data look like:

  volatile uint32_t head;
  volatile uint32_t tail;
  volatile Element   array[Capacity];

and then correct all the castings in the code to make them work properly.

I've got something to compile that looks like:

  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::push( Element & item )
  {
    uint32_t  nextTail = increment(tail);
    if (nextTail != head) {
      array[tail] = *((volatile Element *)(void *)&item);
      tail = nextTail;
      return true;
    }
 
    // queue was full
    return false;
  }
 
 
  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::pop( Element & item )
  {
    if (head == tail) {
      // empty queue
      return false;
    }
 
    item = *const_cast<Element *>(&(array[head]));
    head = increment(head);
    return true;
  }

We'll have to see how this runs.

Posted in Coding, Cube Life | Comments Off on Tricking a Tricky Threading Problem

The Much Maligned Pointer

Wednesday, November 24th, 2010

There are books about the subject. There are people that swear they are the Devil's handiwork. They are the source of much debate, and in my opinion, they are much maligned. They are pointers.

Pointers in C and C++ code are as important as any other language construct. They are an essential tool in the arsenal of a software developer, and they have to be mastered. Yes, if put in the wrong hands pointers can be a very dangerous thing. But so can knives, guns, and explosives. But that doesn't mean that you want to start digging out tons of rock with shovels because explosives are "dangerous". It means that you have to make sure you have skilled technicians that understand the way in which to safely handle the tools of the trade.

Same with pointers. You have to be skilled and trained in their use, but that doesn't mean you have to spend a decade learning how to use them. You just need to keep an eye on them and remember to have strict rules in their usage and lifecycle.

For example, if you're using pointers on a queue, make sure that your method comments indicate when you're taking ownership of the pointer, and when you're passing ownership to the caller. Then it's clear. Simple.

Pointers are really no different than any resource you have to track - pooled database connections, threads, all these offer the same level of difficulty as the lowly pointer. But all are essential tools of the skilled software developer.

Fear not the pointer. It is your friend.

Posted in Coding, Cube Life | Comments Off on The Much Maligned Pointer

Very Non-obvious Memory Leak (cont.)

Tuesday, November 23rd, 2010

Today I spent the nearly the entire day trying to find the remaining memory leak(s) on my ticker plant. Again, only a few were being effected and again, I focused on the code they exclusively have. Once again, this was a complete waste of time as it wasn't in the exclusive code but a very odd little bug in the shared code.

When I'd struck out after several hours of testing, I decided to start at one and and sweep the code with a fine-tooth comb. Starting at the incoming UDP feed, I cut the rest of the app off and checked the memory usage. Stable. Good. Now let's add in the next step. Leak? Ha... let's fix him. Continue until we have the leak spotted.

The first problem may not have been a leak, but it was an unreliable memory usage pattern. The UDP datagrams came in the boost asio socket and I placed the datagram into a std::string with the time as microseconds since epoch into a simple stl::pair. This was then placed into a simple std::deque. Something like this:

  typedef std::pair<uint64_t, std::string> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

where the CircularFIFO is a single-producer, single-consumer, lockless, circular FIFO buffer for fast pushes and pops of the data coming off the wire.

The problem with this design is that we have to create the std::string every time anyway, and the storage of this structure is very unpredictable. What I decided to do was to switch from the stack to the heap and change the structure:

  typedef std::pair<uint64_t, std::string *> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

Now it's basically two 64-bit ints and the heap will reclaim the memory as needed. This was a nice addition, but it wasn't the final problem.

It was at the end of the day, but thank goodness that I found it. It turns out that the ZeroMQ send() method was the culprit. Normally, the ZeroMQ has been very nice for me. Why these messages caused problems, I have no idea, but it's the one method and nothing else.

I know they are working on a new version (2.1) with the latest OpenPGM included, and that will be nice to see. Tomorrow morning when they are all online, I'll ask what the story is on the release of 2.1. Until then, I'll deal with the smaller, but still annoy leaks.

Whew!

[11/24] UPDATE: I talked to the ZeroMQ guys this morning and they say the release of 2.1 is scheduled for this week. Nice. I'll get it early next week and try it out.

Posted in Coding, Cube Life | Comments Off on Very Non-obvious Memory Leak (cont.)

Very Non-obvious Memory Leak

Monday, November 22nd, 2010

Today has been spent trying to track down a very non-obvious memory leak in my ticker plant code. I've been watching it run over the last few days - fixing little things as I see them, and each time the app runs longer and better. Good... we're moving in the right direction.

But it's odd that a few of my ticker plants have a problem with a growing memory footprint. Very odd indeed. So I started digging into these exchange feeds. My first mistake was to ignore all common functionality - after all, if it's common with the exchange feeds that aren't leaking, then it can't be that code. Right?

Wrong.

What I found was that I needed to be exceptionally careful even when using the compare-and-swap atomic operations. It's possible that two threads, on two CPUs, are doing their own thing on that one variable, and if it's only in their cache, it's possible that there might be a time when the caches are updated and the main memory isn't. This could cause me to "loose" a message, and leak memory.

What I did was put a simple boost spinlock mutex on the value and then we got a lot more stability. That's good.

Unfortunately, that's not the end of the story... but it's the end of the day. A long day for a little solution. Tomorrow I'll have to see what the remaining problems are.

Posted in Coding, Cube Life | Comments Off on Very Non-obvious Memory Leak

Making the Broker Client Direct Dial Aware

Friday, November 19th, 2010

Ringmaster

Yesterday I worked on the Broker's C++ Service adapter to allow for these direct dial connections with the addition of the boost asio acceptor listening on a ephemeral port and putting that in the registration message to the Broker. That looked to be pretty good, and so today I worked on adding the client-side of the protocol.

I went with a simple scheme - if the user explicitly asks to locate() a service, then we'll ask the Broker for the direct dial location, and save it. Then, for every connection to that service, we'll have a pool of connections to use that connect directly to the service and do not go through the Broker.

There's a hitch - if the Broker is trying to load balance my calls, then I'm going to mess him up as I am creating a pool on the one and only location he's giving me. This isn't great, but the alternative is to not have any pooling of connections and ask the Broker for the location each time, and then create a connection and then throw it away.

It's possible, but I decided against it. I think there's more value in this pooling, but we'll see. Maybe I'm all wet and the real value is in asking the Broker each time. We'll have to see how it plays out in the usage patterns.

Posted in Coding, Cube Life | Comments Off on Making the Broker Client Direct Dial Aware

Google Chrome dev 9.0.587.0 is Out

Friday, November 19th, 2010

This morning I noticed that Google Chrome dev 9.0.587.0 was out, and the release notes point to quite a few nice fixes:

GPU Related Fixes

Crash Fixes

Instant Fixes

[r65953] Move click-to-play to about:flags. (Issue: 62091)

Not a lot of detail about what each of these mean, but I'm encouraged by the GPU fixes as it's possible we're getting closer to that elusive super-browser where things are really exceptionally fast. Here's hoping...

Posted in Coding, Everything Else, Open Source Software | Comments Off on Google Chrome dev 9.0.587.0 is Out

Adding New Broker Service Access Points

Thursday, November 18th, 2010

Ringmaster

I was talking with the developers of the Broker today and we realized that it was about time to add in a dial direct access mode for the services that register themselves with the Broker. Up to now, all services would register themselves with the Broker via a single socket connection, and then multiplex that socket with many channels, each identified by a unique 16-byte ID. This is all well and good, but should there be a problem on one of these channels that's non-recoverable, the entire socket will be dropped, and with it all the channels. Not ideal.

The solution is to have a service open up a ephemeral port, and then send that data to the Broker in the registration message so that the Broker can use that additional listener as a different way to establish connections to the service. Think of it as "dial direct, and save".

In boost asio, this is pretty simple. In the service handler code I added a listen() method so that calling this would create a new acceptor, bind it, and start it listening. Additionally, I'd pull out the ephemeral port from the acceptor so I could later send it to the Broker.

  bool MMDServiceHandler::listen()
  {
    bool    error = false;
 
    // first, let's make sure we have what we need... a Boss...
    if (!error && (mBoss == NULL)) {
      error = true;
      cLog.error("[listen] the MMDService is missing... please check on this.");
    }
 
    // create an acceptor if we don't already have one
    if (!error && (mAcceptor == NULL)) {
      mAcceptor = new boost::asio::ip::tcp::acceptor(mBoss->mIOService);
      if (mAcceptor == NULL) {
        error = true;
        cLog.error("[listen] unable to create acceptor - very bad news.");
      }
    }
 
    // if we're OK, then fire up the acceptor for listening - pretty basic
    if (!error) {
      using namespace boost::asio::ip;
      // make the 'endpoint' in boost terms - IPv4 and a random port
      tcp::endpoint ep(tcp::v4(), 0);
      // open up the acceptor and set SO_REUSEADDR to enabled
      mAcceptor->open(ep.protocol());
      mAcceptor->set_option(tcp::acceptor::reuse_address(true));
      // ...now bind the bad boy and start listening
      mAcceptor->bind(ep);
      mAcceptor->listen();
      // get the port so we can log what we've done
      tcp::endpoint lep = mAcceptor->local_endpoint();
      mPort = lep.port();
      // create a new MMDClientProxy to populate when we get hit
      MMDCLientProxy  *proxy = new MMDCLientProxy(this, mAcceptor->io_service());
      // now fire off the async accept on this guy
      mAcceptor->async_accept(proxy->getSocket(),
              boost::bind(&MMDServiceHandler::handleIncomingConnection, this,
                  boost::asio::placeholders::error, proxy));
    }
 
    return !error;
  }

The only real tricky point was that the endpoint I created with the port of zero wasn't modified by boost for the port number that was actually used. I needed to go into the acceptor, get it's local endpoint, and then from it, get the port. Took me a few minutes to figure that one out - not a lot of documentation on that point.

With this, and the existing code in the handler to handle the multiplexed channel connections, it wasn't too bad to make this new client proxy a fully functioning "portal" to the service. Pretty nice.

Now we need to update the Broker to vend this information, and then the client code has to be updated to read it and make the direct connection. Pretty neat stuff.

Posted in Coding, Cube Life | Comments Off on Adding New Broker Service Access Points

Sweet Little Monitoring Tool – `dstat`

Wednesday, November 17th, 2010

Linux

I was talking to a few folks today and mentioned that I wanted to be able to snapshot the CPU usage on my ticker plant box every few seconds during the day so I could make some intelligent sizing decisions as to what can live on one box, and they mentioned dstat.

This is one sweet little monitoring tool.

I've set up an alias to run the following:

  dstat -t -c -m -n --output dstat_data.csv 5

and on the console I'll get the lovely color output, but to the file dstat_data.csv I'll get a nice importable data set. Sweet!

This is why I love building and running code on linux - it's just so darn neat!

Posted in Coding, Cube Life | Comments Off on Sweet Little Monitoring Tool – dstat

The Retry Loop – Painful to Get Right

Wednesday, November 17th, 2010

I've been hardening my ticker plant code for the last several days, and today it's been all about logging more intermediate results and putting in retry loops. The problem I find with retry loops is that if you didn't plan for them in the first place, making it fit in nicely can be kind of tricky. If you plan on your method returning if it can't succeed, then your retry loop needs to be within that method completely.

But what if there's exceptions? Well... you have to try/catch around the code, and then if you get a continued failure, then you have to throw the last one? Maybe you throw a new exception with the last one contained within? Not obvious.

Then there's the really layered code where you're in the middle of a call stack, and you're trying to retry, but some of the methods you call already retry, and some don't. So do you "double retry" on some to cover the others? It's not at all easy to do.

Oh don't get me wrong... you can do it... it's just not easy.

I've been hammering away at a lot of these in my code this morning and I'm getting fewer and fewer instances of needing to get back in the code. It's getting close, that's for sure. It's just not easy.

But then, if it really were, everyone would be able to do it.

Posted in Coding, Cube Life | Comments Off on The Retry Loop – Painful to Get Right