Archive for November, 2010

Finding Subtle Bugs Takes Time

Tuesday, November 30th, 2010

bug.gif

I've been lucky today - I finished a good chunk of code this morning and now I'm able to watch my ticker plants run. It's a funny thing, finding the subtle bugs takes time. You have to watch the code run, ask questions, monitor log files - all these things take time, and it's that phase of polishing an app that's very rewarding.

If you have a distributed app you need to monitor the load and rebalance things as needed. That takes more time.

I've been luck this morning to be able to take this time and really study what's happening. I've found a few bugs, and those would not have been easy if I were in a hurry as they weren't serious crashing issues. Still, they needed to be fixed.

It's nice to have a little time to watch and monitor the work you've done. It's really very rewarding.

MarsEdit 3.1.4 is Out

Tuesday, November 30th, 2010

MarsEdit 3

This morning I got a tweet from MarsEdit that 3.1.4 was out with a "major memory meltdown" that was causing a significant memory leak in MarsEdit. Daniel seems to have taken Bill B's advice and looked for the retain/release mismatch on the WebKit windows and was able to track it down. Very nice work.

I was noticing this but wrote it off to the same memory growth in Safari and Unison - all of these guys are using WebKit, and they all grow to enormous sizes, but that's just the way of things for now. Hopefully this will trigger Apple and Panic to double-check the memory usage as well. Who knows? Maybe they'll all be trimmed up soon.

ZeroMQ is Nearing Release of 2.1

Tuesday, November 30th, 2010

ZeroMQ

I've found a singular problem with ZeroMQ, and noted in the IRC chat conversations that this should be fixed in the soon-to-be-released 2.1. It's a simple memory leak with the sending of messages. My code is pretty simple: I get the payload for the message, I get the ZMQ socket it needs to be sent on, and then I simply make a ZMQ message and send it. That's about as simple as you can get:

  1. if (aTopic.first() != NULL) {
  2. try {
  3. // lock up this socket while we send the data out...
  4. boost::detail::spinlock::scoped_lock lock(aTopic.second());
  5. // make a ZMQ message of the right size
  6. zmq::message_t msg(aPayload.size());
  7. // ...copy in the data we need from the payload
  8. memcpy(msg.data(), aPayload.data(), aPayload.size());
  9. // ...and WHOOSH! out it goes
  10. aTopic.first()->send(msg);
  11. } catch (std::exception & e) {
  12. error = true;
  13. cLog.error("[sendToZMQ] trying to send the data got an "
  14. "exception: %s", e.what());
  15. }
  16. }

If I comment out line 10 - the send(), the memory doesn't grow any faster than I might expect based on the cached messages. But leave it in and the memory grows and grows. More interestingly, it's different for the different kinds of payloads I send. Very odd.

Anyway... the ZeroMQ guys said they planned on having a release last week, but it seems things happened, and that's OK with me - this is important. I need to keep the memory under control.

I Love Christmas Music

Tuesday, November 30th, 2010

Christmas Tree

The first work day after Thanksgiving is always a great day for me. This year I was a day late, but I got there... it's Christmas Music Time! Yup, that's when I pull out the Christmas music playlist on my iPhone and get to listening to some of the greatest music of the whole year. It's also the time of year when we all are a little nicer, a little kinder, and everyone tries to put on their best face. It's a great time of year.

I remember being in a choir room of a rather big church in Indianapolis when I was in high school, and there was a picture on the wall with a quote:

For the common things of everyday,
God gave man speech in a common way.

For higher things men think and feel,
God gave man poets, their words to reveal.

But for heights and depths no words can reach,
God gave man music, the soul’s own speech. - Anonymous

Every time I read it I tear up. It's perfect.

CKit’s IRC Protocol Implemented in Boost ASIO

Monday, November 29th, 2010

Boost C++ Libraries

For the last day and a half I've been working on re-writing my C++ IRC Client code to use boost's asio, and it's been pretty interesting. I will say there are a lot of plusses to using the boost socket functionality - even over my own socket library (imagine that!).

First, it's got the complete asynchronous mechanism for sending and receiving - you just have to love that. Also, they have done a wonderful job in making it all very rational and sane. Asynchronous methods perform as you'd expect, and have remarkably similar signatures to their synchronous counterparts. It makes writing with the classes very simple. Clearly, a lot of thought has gone into this stuff, and that's really nice to see.

Secondly, there's not the need for all the threads that I had in my old code. Primarily because of the single io_service thread that boost asio uses for all async operations. This really is a great timesaver. You can easily have multiple threads sending out chats to the IRC server with the async writer. Very slick.

Finally, the resulting code size is much smaller. That's more of a consequence of the other two, but the payoff can't be understated. Less code means less maintenance, less cost, and less time. You just can't beat that.

So I have it all done, but I haven't been able to test it yet as we don't have an IRC server up and running - yet. It's been discussed, and maybe today they'll see that I've gotten code ready, and they'll put one up. It's not hard - just takes a little time, that's all. But the benefits will be enormous. I look forward to getting the server going, testing my code, and integrating this into my TickerPlants and other libraries. It's just an amazingly powerful tool for support and problem solving.

SNMP vs. IRC – Complexity Over Simplicity

Monday, November 29th, 2010

chat.jpg

In the past, I've used IRC in my applications to great utility. I created a simple framework that allowed me to have each application instance "pose as" a "user" on IRC, and when the application was running, you could see this in the chat rooms, and you could interact with it by as complex, or as simple, a means as you, the application designed, wanted. The protocol is simple, it's fast, and there's very little administration to the system.

SNMP is not so simple, but it's far more common in the monitoring and control of applications. The question is really: Is the complication worth it? I'm not at all certain that it is.

Several months ago, the decision was made to use Jabber as opposed to IRC. I went along with it as that's the right thing to do. But it's not the choice I'd have made. The ircd is simple, fast, and with a little additional code, you can make it log everything. From that point, you don't have any concerns about compliance, and you're free to use it as needed.

It seems that the powers that be are re-evaluating the Jabber solution as it's a little more complex in it's implementation, and the goal here is not to have something really complex, it's to have something really easy. So I'm hoping that I'll get to finish my IRC client based on the boost asio work. If so, it should be exceptionally fast and easy to use. Both are the hallmarks of a great utility.

I hope I get to see it come to pass.

The complexity of SNMP just seems like massive overkill.

Sony Selects GNUstep as Development Platform

Monday, November 29th, 2010

GNUstep

I read a lot about this over the weekend, and the more I read the more it made me smile.

The foundation upon which this project is base comes from the GNUstep community, whose origin dates back to the OpenStep standard developed by NeXT Computer Inc (now Apple Computer Inc.). While Apple has continued to update their specification in the form of Cocoa and Mac OS X, the GNUstep branch of the tree has diverged considerably.

Yeah... they've diverged only in that Apple's Cocoa has moved forward and GNUstep is still using the OPENSTEP guide as it's reference point. And there's nothing wrong with that. OPENSTEP is fantastic... it's just that Apple has decided to keep moving and GNUstep hasn't. Sony isn't content on sitting still either:

We depart somewhat from the GNUstep adherence in that our goal is to thoroughly modernize the framework and optimize it to target modern consumer electronic (CE) devices. These modern conveniences include such features as touch displays and 3D graphics.

I use WindowMaker as my X11 desktop of choice. It's all about GNUstep. I think it's fantastic that Sony is picking this up. It means that GNUstep is going to be getting a much needed shot in the arm with all the added interest. Really great news.

NeXT was right. Took a long time, but they were right. Fantastic.

Tricking a Tricky Threading Problem

Wednesday, November 24th, 2010

Professor.jpg

This afternoon I've been tracking down a good solution to a nasty threading problem. This part of my ticker plants is the UDP receiver and it tries to get the UDP datagrams off the socket and into a buffer as fast as possible. To that end, I've got a single-consumer, single-producer, lockless FIFO queue that should be thread-safe as the 'head' and 'tail' are volatile and there's only one thread messing with one of these guys at a time.

But that's just the theory. Here's what the code looks like:

  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::push( Element & item )
  {
    uint32_t  nextTail = increment(tail);
    if (nextTail != head) {
      array[tail] = item;
      tail = nextTail;
      return true;
    }
 
    // queue was full
    return false;
  }
 
 
  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::pop( Element & item )
  {
    if (head == tail) {
      // empty queue
      return false;
    }
 
    item = array[head];
    head = increment(head);
    return true;
  }

Here's what happens: I'll be running just fine and then the call to pop() will return true and because of that, the value (a pointer) will return as something. This presents a real problem. If it returns a NULL, that's easy to deal with. Problem happens when it returns junk.

Ideally, it wouldn't return a NULL or junk, but coding for that has turned out to be harder than I thought. First, I can just check for a NULL or what I think of as "junk" data, and not delete that pointer, but what happens when it returns "junk" that's not fitting my pattern of "junk"? Well... I'll delete it and BAM! SegFault.

Not easy.

I believe the problem is one of compiler preference. The data in the class is defined as:

  volatile uint32_t head;
  volatile uint32_t tail;
  Element   array[Capacity];

where the lack of the volatile keyword is the big deal here. What I need to do is to make the data look like:

  volatile uint32_t head;
  volatile uint32_t tail;
  volatile Element   array[Capacity];

and then correct all the castings in the code to make them work properly.

I've got something to compile that looks like:

  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::push( Element & item )
  {
    uint32_t  nextTail = increment(tail);
    if (nextTail != head) {
      array[tail] = *((volatile Element *)(void *)&item);
      tail = nextTail;
      return true;
    }
 
    // queue was full
    return false;
  }
 
 
  template<typename Element, uint32_t Size>
  bool CircularFIFO<Element, Size>::pop( Element & item )
  {
    if (head == tail) {
      // empty queue
      return false;
    }
 
    item = *const_cast<Element *>(&(array[head]));
    head = increment(head);
    return true;
  }

We'll have to see how this runs.

The Much Maligned Pointer

Wednesday, November 24th, 2010

GeneralDev.jpg

There are books about the subject. There are people that swear they are the Devil's handiwork. They are the source of much debate, and in my opinion, they are much maligned. They are pointers.

Pointers in C and C++ code are as important as any other language construct. They are an essential tool in the arsenal of a software developer, and they have to be mastered. Yes, if put in the wrong hands pointers can be a very dangerous thing. But so can knives, guns, and explosives. But that doesn't mean that you want to start digging out tons of rock with shovels because explosives are "dangerous". It means that you have to make sure you have skilled technicians that understand the way in which to safely handle the tools of the trade.

Same with pointers. You have to be skilled and trained in their use, but that doesn't mean you have to spend a decade learning how to use them. You just need to keep an eye on them and remember to have strict rules in their usage and lifecycle.

For example, if you're using pointers on a queue, make sure that your method comments indicate when you're taking ownership of the pointer, and when you're passing ownership to the caller. Then it's clear. Simple.

Pointers are really no different than any resource you have to track - pooled database connections, threads, all these offer the same level of difficulty as the lowly pointer. But all are essential tools of the skilled software developer.

Fear not the pointer. It is your friend.

Very Non-obvious Memory Leak (cont.)

Tuesday, November 23rd, 2010

bug.gif

Today I spent the nearly the entire day trying to find the remaining memory leak(s) on my ticker plant. Again, only a few were being effected and again, I focused on the code they exclusively have. Once again, this was a complete waste of time as it wasn't in the exclusive code but a very odd little bug in the shared code.

When I'd struck out after several hours of testing, I decided to start at one and and sweep the code with a fine-tooth comb. Starting at the incoming UDP feed, I cut the rest of the app off and checked the memory usage. Stable. Good. Now let's add in the next step. Leak? Ha... let's fix him. Continue until we have the leak spotted.

The first problem may not have been a leak, but it was an unreliable memory usage pattern. The UDP datagrams came in the boost asio socket and I placed the datagram into a std::string with the time as microseconds since epoch into a simple stl::pair. This was then placed into a simple std::deque. Something like this:

  typedef std::pair<uint64_t, std::string> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

where the CircularFIFO is a single-producer, single-consumer, lockless, circular FIFO buffer for fast pushes and pops of the data coming off the wire.

The problem with this design is that we have to create the std::string every time anyway, and the storage of this structure is very unpredictable. What I decided to do was to switch from the stack to the heap and change the structure:

  typedef std::pair<uint64_t, std::string *> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

Now it's basically two 64-bit ints and the heap will reclaim the memory as needed. This was a nice addition, but it wasn't the final problem.

It was at the end of the day, but thank goodness that I found it. It turns out that the ZeroMQ send() method was the culprit. Normally, the ZeroMQ has been very nice for me. Why these messages caused problems, I have no idea, but it's the one method and nothing else.

I know they are working on a new version (2.1) with the latest OpenPGM included, and that will be nice to see. Tomorrow morning when they are all online, I'll ask what the story is on the release of 2.1. Until then, I'll deal with the smaller, but still annoy leaks.

Whew!

[11/24] UPDATE: I talked to the ZeroMQ guys this morning and they say the release of 2.1 is scheduled for this week. Nice. I'll get it early next week and try it out.