Archive for the ‘Coding’ Category

Starting the Crush: T-20 Days and Counting

Monday, March 21st, 2011

Today starts the push for the next four weeks to get out a new greek engine at The Shop. I'm not at all sure it's even possible, but I'll give it everything I've got for the four weeks to see if it can be done. There are a lot of things that have to happen in the right order, and pretty quickly. I've been told they'll all get done, but I have my doubts. It's just a lot of things that have to happen just so.

This morning already I'm spending too much time getting the next release of ZeroMQ checked out. It should be a pretty simple thing to check out, but it's taking far longer than the previous versions. Still, I have to get this done, and then I have to get SQLAPI++ working for the FreeTDS drivers we have on the boxes. All this in order to load up the data from the MS SQL Server databases that holds it all.

Yup, it's going to be tough to get it all done, but I'm willing to give it a try.

Google Chrome dev 11.0.696.14 is Out

Friday, March 18th, 2011

This morning I noticed that Google Chrome dev 11.0.696.14 is out, and it seems to have several updates for a small minor point change. OK with me, I'm glad to see the continual improvement of the dev channel. I was also interested in the about:gpu, and tried it on my MacBook Pro - very neat. Glad it's used in rendering.

Updated Git to 1.7.4.1 on My MacBook Pro

Thursday, March 17th, 2011

gitLogo.gif

This afternoon I realized that git on my Mac Mini at home had the 64-bit version of git installed from the Mac OS X git installer, and that was a big mistake as it's a simple Core Duo CPU which can only run in 32-bit more on Snow Leopard. I had the 64-bit version for my MacBook Pro, and my iMac, but forgot to get the 32-bit version for the Mac Mini. Bummer.

Clearly, it'd be great if they had a universal build so I didn't have to deal with this, but hey... they build it and I can just keep things straight. My bad. Plain and simple. So I got the most recent builds for x86_64 and i386, and put the 64-bit package on my MacBook Pro, and tonight I'll install the 32-bit version on my Mac Mini. There. It'll work.

It's nice to see:

  $ git --version
  git version 1.7.4.1

Nice. Love it when things "just work".

[3/18] UPDATE: this morning I installed the latest updates on my Mac Mini (i386) and on my iMac (x86_64) so all boxes are up to date and working fine.

Interesting Erlang C++ Library

Thursday, March 17th, 2011

erlang

I was looking on the web today for some C++ library that would wrap the erlang interface (ei) that appears very full-featured and yet somewhat cumbersome for a C++ codebase. I looked at several until I found one that looked to be full-featured, well thought out, and complete. I stopped when I found eixx on GitHub. This looks to be very nice. I haven't used it - yet, but if we go forward with the idea of having each service connect into the distributed erlang network and use that as the way to talk to the services, and serve up data.

The real advantage with this scheme is that we no longer have to deal with async I/O on the sockets for the client or the server. It's all encapsulated in the erlang runtime/library and that takes all the responsibility from our hands and puts it in the hands of the guys who made the language. Very nice.

Anyway... it'll be interesting when we move on this. I'm actually looking forward to it.

Lots of Meetings – Tiny Developing

Wednesday, March 16th, 2011

Today has been a mixed bag... I did get a little development done on the ticker plants - just a few little things to polish up some things, but I also had hour-long design meetings (not bad), and then multi-hour-long debugging meetings where there was no hope of actually finding a problem because the developers had so little experience developing on linux that there was no hope of a successful test, and they really just needed to understand the proper way to edit/compile/test code.

Ringmaster

The design meeting was really kind of interesting. We have The Broker, and it's been an erlang process, and a Java process, but it's always been centralized. I mentioned today that I wondered why it wasn't using Distributed Erlang, and let the entire brokerage system sit on all the machines and handle everything in a more distributed manner. Let erlang handle the registration, and the message passing. We can use J interface (Java library) and el (C library) to make our server applications appear as distributed erlang nodes. This makes it much easier to do all the things the Broker does.

Sure, the clients still can use sockets to connect to the Broker - or should I say A Broker, where we can have one running on each server in the server room. The client (defined as processes that aren't nodes in the distributed erlang system) can then use the socket interface to connect to a Broker and work as it always has, but the wrinkle is that it's only really brokering his traffic. This gets rid of a lot of the problems we had faced in the past - and went to great lengths to try and solve.

So much gets easier if the main components of the system are all distributed erlang nodes. Very nice solution to the problem.

bug.gif

The debugging session was only slightly productive, and ultimately disappointing. The code was compiled on CentOS5 and run on Ubuntu 10.04.1 - I'm not at all surprised that things broke. Far too different a version of libc, gcc - everything, really. We couldn't even get the code to build on Ubuntu. They need to step back, build the code on the box they are going to run it on, and then move forward.

It's slow going sometimes.

Google Chrome dev 11.0.696.12 is Out – With a New Image!

Wednesday, March 16th, 2011

Google Chrome

This morning I saw that Google Chrome dev 11.0.696.12 is out, and it's got a good number of fixes - mostly UI components and front-facing issues, but that's OK too. Glad to see it. One thing I didn't expect as a brand new icon - seems the Googlers are going for a more geometric look and less of the shiny plastic look.

Good for them. I like it.

Wild Socket Problem – Possibly Bonded NIC Issue?

Tuesday, March 15th, 2011

Ubuntu Tux

Focused on an interesting problem today. In the last few weeks, I've done a lot of re-writing on the UDP receiver in my ticker plant to get it better, faster, etc. And one of the things I've noticed is that I was accumulating, but not logging, dropped messages from the exchange. Now this is a serious issue because I'm looking at both the A and B sides from the exchange - they are meant to be fault-tolerant pairs so that should you loose a datagram on one, the other has it and you can get it there. So to loose packets is significant.

Made more significant in the nature by which I'm losing them. Let's say I start one of my apps that listens to a set of UDP multicast feeds. This guy gets started and it's running just fine. In another shell on the same box, I start another application that listens to a different set of UDP channels. As this second application is starting - the first app starts dropping packets! Within a few seconds, everything stabilizes and both applications are fine and neither app is dropping anything.

If I then stop the second app - the first app drops a few packets! Again, within a second or so, it's all stable again and nothing more is dropped.

From this, I have a few observations and a theory.

  • It is not in the process space - two apps share nothing but the OS and hardware. So it's not "within" either process.
  • It is socket related - because I loose packets on A and B channels, it's not the failure of one multicast channel.
  • It is load related - the more load there is on the first and second apps, the worse the drops.

My theory is that it's the way the bonded interface is configured. Specifically, I believe it's set up to automatically rebalance the load between the two sockets, and in so doing, changing the load causes some of the sockets to be shifted from one physical NIC to another, and the packets are dropped.

It certainly makes sense. The question is: can I effect the configuration in a meaningful way? I looked at the modes for bonding NICs in Ubuntu, and depending on how they have it set up, I might just have to live with it. If so, at least I know where it's coming from.

UPDATE: the core issue is that I can't specify the NIC for boost asio to use for reception of the UDP traffic. If I try to use the address, I get nothing. If I use the "0.0.0.0", then I get data but the problems persist. It's an annoying limitation with boost asio UDP, but it's a limitation, and we'll have to deal with it. Crud.

UPDATE: the only option I found was in the joining of the multicast channel. It turns out that you can tell boost which address to join the multicast address on this takes the form of something like:

  socket->set_option(multicast::join_group(
                              address_v4::from_string(aChannel.first),
                              address_v4::from_string("10.2.2.8"));

where the second address is the address of the NIC you want to listen on. It works only marginally for me, and that's a drag, but it's a possibility if I need it. It's not boost's problem.

[4:20pm] UPDATE: I found out that it's the Intel NIC drivers! A guy in The Shop ran across this for his work a little bit ago, and found the solution in updated drivers for the Intel 10GbE NICs. I've talked to the Unix Admins, and they are building a patch for my boxes. This is fantastic news!

Fun Use of Boost Threads in Monitoring Thread

Monday, March 14th, 2011

Boost C++ Libraries

I was having a bit of a problem with The Broker today, it seemed. It appeared that when I saved my state to the Broker's configuration service, I was hanging, and the monitoring thread that fired off this save was hung. I got the guys to restart the Broker and things seemed OK, but I decided to take advantage of one of the really neat things of boost threads, and fire off the call in a separate thread so if it stalls, the monitoring thread doesn't.

The old code looks like this:

  if (secs - mLastMessageSaved >= 300) {
    saveMessagesToConfigSvc();
    mLastMessageSaved = secs;
  }

becomes:

  if (secs - mLastMessageSaved >= 300) {
    using namespace boost;
    thread   go = thread(&TickerPlant::saveMessagesToConfigSvc(), this);
    mLastMessageSaved = secs;
  }

and now the call to saveMessagesToConfigSvc() is now called by the separate thread, and as soon as the method returns, the thread is killed and cleaned up. Exceedingly sweet!

OK... this is what boost threads are all about, but in comparison to Java threads, or something that takes a little more scaffolding, this is elegant to the extreme. Just add a few constructs to the line and it's done. You can't get much simpler than that. Very nice.

Wonderful Solution to a Locking Problem – Merging the Streams

Friday, March 11th, 2011

GeneralDev.jpg

The past couple of days have been about speeding up the processing of the exchange data through my feed system. Because the code for the decoding of the messages is fixed, specifically for the big OPRA feeds (using OPRA's FAST decoder), most of this is accomplished by re-organizing the data flow and data structures. One of the things I had done a while back was to have the two channels of an exchange feed put their datagrams in separate queues, and then have one thread empty both so as to remove the need for locking on the sequence number arbitration code.

The problem was that this required twice the time to process the data through the decoder because both A and B sides went through the same thread. This can get to be really quite nasty. For example, on an OPRA feed, it takes about 40 usec to decode an average datagram, and at 50,000 datagrams/sec we're looking at 2 sec to process one side, but this design would have to do double that work. Nasty. Lots of buffering.

The solution is to have one thread per datagram stream. That immediately cuts the processing time in half. The problem is that we then need to lock for the sequence number arbitration. Nasty. Then I had a flash - merge the data!

First, tag one of the channels as primary, and have it control the arbitration. Every other channel decodes it's datagrams but then instead of trying to have the thread send it out, have that thread put the decoded messages into a queue that the primary will process as soon as it's done with it's datagram. The arbitration is very fast because it's as simple as checking the sequence number and a few flags. It's the decoding that takes the time. With one of the FIFO queues, we can have multiple non-primary channels, and have the primary take the results off and send them out.

Even more importantly, the primary can be the primary feed line of the exchange, and that makes things even better as the secondary feed is really only needed when there's a failure of the primary. What we've done then, is to make it more like the "normal" feed with a "backup" just in case.

Very neat.

Google Chrome dev 11.0.696.3 is Out

Friday, March 11th, 2011

This morning I noticed that Google Chrome dev 11.0.696.3 was released with a few issues addressed. Nothing major, but it's nice to see the attention to detail my the builders.