Archive for March, 2011

Getting to Have a Little Fun

Thursday, March 31st, 2011

smiley.jpg

Today Liza and the kids are coming down to meet me at the office and spend a "City Day" downtown. There are a few places everyone wants to go, and then it'll just be lunch, and a lot of walking around. It's a great way to spend a nice day, and while it's a little chilly, it's not bad.

Looking forward to it.

Creating the Broker’s Login Service

Wednesday, March 30th, 2011

Ringmaster

Most of today has been spent working on the latest rewrite of The Broker - but this guy is the distributed erlang version where the service registry, distribution, routing, etc. are all handled by erlang. This guy will then be a little node on each server that uses or provides services, and it'll connect into a web of other such nodes and share it's state to all so that we can load balance around the net, and provide some nice fault-tolerance.

Today I spent a lot of time working on the login service. There's a lot there because I have to get the LDAP authentication going and then deal with the mongo database - all from erlang. There's a good start, but I'm still very new to erlang, and this is just plain slow going.

Tricky Timing Bug with Inherited Threads

Wednesday, March 30th, 2011

bug.gif

This afternoon a co-worker pointed out a problem with a threaded app he was testing. It was using a data feed component I'd written, and it was causing seg faults when it was shutting down. He was having a hard time figuring out why it was crashing, and I was having a hard time figuring out what was causing the state to be reset.

The thread model I was using is a simple class that runs a process() method over and over catching exceptions, etc. until the process() method returns a "stop" flag. Pretty simple stuff. There's the ability for users of the thread object to tell it to stop:

  Thread::setTimeToDie(true);

and the next time it's ready to call process() it bails out and stops. Pretty simple. But it wasn't acting that way in the tests.

I subclassed this Thread for my data feed class, and when I detected that the parent thread was to stop, I stopped some processing sub-threads. The structure is pretty simple - the main thread was handling supervision, a boost ASIO thread was handling the incoming data, and the processing sub-threads were taking that raw data and converting it (two steps) to be used downstream. What was supposed to happen was that the sub-threads were to detect when the parent Thread was to die, and they themselves, would then die.

What appeared to be happening was that the sub-threads weren't getting the message. Or at least not getting it in time. Very odd. If I told the Thread to stop, the sub-threads didn't stop. If I told the Thread to stop, and then did a little shut-down processing, and finally told the Thread to stop again, then things shut down.

It appeared that there's a nasty timing problem here, and I didn't want to leave it at this, but there's no more time today. For now, it's working, but very oddly.

[3/31] UPDATE: Interesting point... this morning, on my walk to the train, I saw it. When the Thread stops, it resets the "time to die" flag to false. The sub-threads weren't seeing it because they weren't checking fast enough. The main thread died, reset it's "die" flag, and the sub-threads just didn't think there was any reason to stop. The fix was easy - don't reset the flag until the start of the next thread. Easy fix.

Finishing up on The Broker

Tuesday, March 29th, 2011

Ringmaster

Today I finished up my part of The Broker - the service registry and the service load (for load balancing) and they were pretty simple modules. That's a good thing because I'm new to erlang, and the simpler the better. Even so, I was introduced to the erlang crash dumps, and picked my way threw a few. Not easy, but once you know what to look for, it's not too bad. Certainly no worse than gdb.

Then I started work on converting the C++ code I had to use the new Broker. It was mostly a gutting as the code I had added for the direct dial was now useless, and that simplified the code quite a bit. Then I needed to add in the ability to send in the load numbers for each service, and I was in business.

I updated my Broker tests - the client and the service, and they now work nicely as well. It's a really good day. Lots of nice progress.

On the DIsadvantages of Being a Lead Developer

Monday, March 28th, 2011

There are a lot of things to like about being a lead developer - you get to have a significant hand in the design of the work you're doing. You get to pick what you want to do, and farm out the rest to others. But there's a dark, ugly side as well - people. Yes, those devices that can't be debugged no matter how long you use gdb - people. And I'm reminded of it today in a very unpleasant way.

The lead on a project is expected to help the junior guys along. Help them learn the craft as it were. They will pick up habits, the goal is that they pick up the good habits of the lead, and then they are able to be self-sufficient in the workplace. Kind of like kids. Being in the middle of that cycle at home gives me keen insight into what's happening to me here at work.

I've got a clingy 5-year old. At home we called it "the five-foot rule". The kids would never be more than five linear feet from us no matter what we were doing, where we were doing it, no matter how big the house was. They were right there. Always. In their early years we were everything - playmate, audience, judge, everything. The same is true at work, it seems.

He's telling me about the status of his work. He's telling me he'll be done soon. When he's done. That he needs to be busy. It's like a clingy 5-year old. I can appreciate that he's trying very hard to be productive and contribute, but he also doesn't realize - like a 5-year old doesn't realize, that there are completely different expectations for him than for me. He's supposed to learn, listen, and if he's really doing his job, be as light a load on me as possible.

But that's not how it's working out.

I'm going to have to slug through this for a while longer. I'm not sure how much longer, but a while at least. If he doesn't get better, then he's the learning disabled kid, and will continue to be a burden on me as long as he's here. That's not going to fly. But assuming he's going to get better, I just have to give him time. It's tough some times.

Building the Next Broker

Monday, March 28th, 2011

Ringmaster

I've been talking about The Broker for a while, and it's been through at least five iterations/revisions/re-writes that I can remember, but we're in our sixth now. What I've seen, as I hit it the hardest of almost anyone, is that the java version built on Netty is just too complex. The design is reasonable, but even that's complex because of the direct dial interface we put on The Broker to remove the massive load some clients can generate. It just wasn't working. So we tried a new approach.

I mentioned distributed erlang, and asked why we didn't build the original erlang Broker using that? It seemed to handle all the registration/location service issues. It also handled the problem of a centralized service in general. When one client makes a request to one service, it's only those two nodes that will see the load - not anyone else. It's a really nice idea - extremely simple, and therefore, probably the best.

So my boss started on this next version. He and I worked on it today and we're down to just a few things that need to be finished: the service registry and the service load (for load balancing). I'll do those tomorrow and we'll be feature complete.

The beauty of this design is that we can add as many native services to the Broker as erlang modules in the distribution package. Each one is very simple, and is loaded on demand from the elang runtime. This places us in the wonderful position of being able to add in the login service, the configuration service, and any other service that we want in the native erlang. Because this new design places a Broker node on each box running a service, this means that for the most used services, like login and configuration, we won't even go off-box to get the response.

This is going to have a huge impact to the reliability of the Broker. It's going to be able to be stopped/started on each node, as needed, it'll connect into the grid and share it's state, and will be able to handle all the routine services on it's own. This is a major win for me. I think it's going to make it great.

First Week of Crush Down – Not Looking Very Promising

Friday, March 25th, 2011

Well... the first week of the four week "delivery cycle" of the new Greek Engine has come and gone, and I have to say, if we get done with it in the next three weeks, I'll be surprised. Maybe not off by much, but there's so much to do, and with two new guys on the team, one very junior, it's a lot of work on my part just to keep them for imploding.

On the other hand, I have to say that things are coming together. They are taking shape in the code, and things are getting written - just not fast enough, and the hardware was always going to be an issue. There's just not enough time to get all the hardware purchased, installed, built and running. It'll be close, if they hurry, but I've already found out that I'm going to be delayed at least a week on some hardware coming from another data center. Doesn't sound like much, but that's 25% slippage.

But progress has been made. We have better integration between the two projects - my ticker plant and the greek engine, and we at least know the sections of code we need to build in order to make this thing work. That's a big help. It's just a matter of time.

Google Chrome dev 12.0.712.0 is Out

Friday, March 25th, 2011

GoogleChrome.jpg

This morning I saw the expected update of Google Chrome dev to 12.0.712.0, marking the move to 12.x. I say this was expected because they just announced that the 10.x series was in beta (or release), and that meant that they needed to "up the voltage" a bit on the 'dev' channel. The third number is all that seems to matter, and there are still a few nice things to find in this update.

What seems to be missing is any new features that they might want to add after Firefox 4 went final a week ago. I've played with Firefox 4, and it's OK, but the rendering is the biggest issue, and while that's not quite right, I'll stick with Chrome. But Firefox is smooth, and the "Spaces"-like support is very cool and keeps the tab count to a minimum. Very nice indeed.

Note: the old icon is back, too. Seems "plastic" now.

Getting SQLAPI++ Hitting MS SQL Server on Ubuntu 10.04.1

Thursday, March 24th, 2011

Ubuntu Tux

One of my very favorite database libraries is SQLAPI++ because it's very well designed, works with a huge number of databases, is thread-safe, and doesn't require you to link in the underlying database libraries when you build your code. It's nice. So when it came time to hit a database from linux here at The Shop, I naturally turned to an old friend. But there were some ugly truths lurking there for me, and I had to spend quite a bit of time getting things sorted out.

First, off, SQLAPI++ on linux doesn't talk to MS SQL Server through any libraries like FreeTDS. Nope, you have to go through iODBC. Ick. Thankfully, FreeTDS has the ability to work with iODBC, but getting things set up and tested was a pain, so here's what I had to do.

Get everything installed:

  • iodbc
  • libiodbc2
  • sqsh

then start to configure things.

Get FreeTDS going by editing /etc/freetds/freetds.conf to look something like this:

  [devSQL]
     host = dbhost
     port = 1433
     tds version = 7.0

Now you can put in the default .sqshrc and prove that you can talk to the database with sqsh:

  $ sqsh -S devSQL -U me -P secret
  1>

Success. That's a good first step. Now let's configure ODBC. Edit /etc/odbc.ini to look something like:

  [ODBC Data Sources]
  devSQL = Devel SQL Database

  [devSQL]
  Description = Development database
  Driver      = /usr/lib/odbc/libtdsodbc.so
  Trace       = No
  Server      = dbhost.yoyo.net
  Database    = master
  Port        = 1433
  TDS_Version = 8.0

and then edit /etc/odbcinit.ini to look something like this:

  [FreeTDS]
  Description = TDS driver (Sybase/MS SQL)
  Driver      = /usr/lib/odbc/libtdsodbc.so
  Setup       = /usr/lib/odbc/libtdsS.so
  CPTimeout   =
  CPReuse     =

At this point, we can run the iODBC admin utility to see that it was set up properly:

  $ iodbcadm-gtk &

and the driver should be right there. Use this to "Test" the connection, and you should see that it's there and working.

Finally, use the SQLAPI++ test client to see that everything is working there as well:

  $ test64
  1.    Oracle
  2.    SQL Server
  3.    DB2
  4.    Informix
  5.    Sybase
  6.    InterBase
  7.    SQLBase
  8.    MySQL
  9.    PostrgeSQL
  0.    ODBC
  0
  Client version: unknown before connection
  Database name (connection string):    devSQL
  User name:   me
  Password:    secret
  Server: Microsoft SQL Server Release 10.00.2714
  Server version: 10.0
  Client version: 0.82
  $

Sometimes STL Really Impresses Me

Wednesday, March 23rd, 2011

Sgi

I was talking to a developer today and we got to talking about the fact that the STL map spec says that it's iterator is not invalidated on insert or removal - except in cases where the iterator is on the removed object. So it got me to thinking - what happens when you invalidate an iterator?

So I wrote this code:

  #include "map"
  #include "string"
  #include "exception"
  #include "iostream"
 
  using namespace std;
 
  int main(void) {
    map<string, string> map_test;
    map<string, string>::iterator iter_map_test;
 
    map_test["AAAAA"] = "11111";
    map_test["BBBBB"] = "22222";
    map_test["CCCCC"] = "33333";
 
    iter_map_test = map_test.find("BBBBB");
 
    map_test.erase("BBBBB");
 
    try {
      string value = (*iter_map_test).second;
      cout << "got : " << value << endl;
      ++iter_map_test;
      cout << "next: " << (*iter_map_test).second << endl;
    } catch ( exception & e ) {
      cout << e.what() << endl;
    } catch ( ... ) {
      cout << "generic exception." << endl;
    }
    return(0);
  }

and the results are amazing (to me):

  $ g++ maptest.cpp -o maptest
  $ maptest
  got : 
  next: 11111
  $

So the iterator is resetting itself when invalidated. That's very interesting! Now it's not what I expected, and I'm not certain I want to risk it, but it's nice to see that there is something non-fatal about the process. That part, it seems, the spec is right on the money about.

Clever dudes.