Archive for May, 2007

Crummy Unix Admins

Thursday, May 31st, 2007

I was chatting with a friend of mine this morning and he told me that the unix admins at his place unilaterally decided to place machines on a weekly reboot list rather than figure out a problem with the automounter. While I understand the need for triage, and dealing with a production problem in the short-term different than the long-term, it's sad to see highly-paid unix admins stopping with the short-term fix ("reboot it!") when the real solution may be difficult to find, but the right thing to do.

I had a recent run-in with a unix admin in the Far East who thought that I needed to logout each night because excessively long logins consume inodes that aren't returned to the system except on logout. This was filling up the /var partitiion on a box. Now, I'm no dummy... and while the /var partition might have been full, I can't imagine how the length of a login has anything to do with it. I can imagine processes that generate a lot of messages for /var/log/messages might be an issue, but that's got nothing to do with a single login.

It's these kinds of "excuses" that make it hard to defend my industry and career to others not in this industry. I'm sure these kinds of folks are everywhere - Dilbert has legions of fans that claim that the same thing was happening to them just the other day. But there's a pride in your profession that it's hard to maintain when you realize that your profession is really no better than Dilbert's.

HTML and Page Styles

Wednesday, May 30th, 2007

I've spent more time than I'd have thought possible getting a page in IE to look like the same page in Firefox. I'm sure this is no surprise to anyone doing web design - HTML is not really going to give you what you want, you've got to use CSS and then you get into the same realm that you were in back in the days of nroff/troff and LaTeX. What's wrong with these people?

When I did my thesis, it was not on a Mac - it was on the VAXen at Purdue. Additionally, there were standards you had to meet with the style and layout of your thesis - all the way down to section naming conventions (7.2.1, etc). I understand why Purdue wanted to do this - so all theses from there had the same look to them. Reasonable. To do this, there were massive macros and included files for troff and then you could run it through the Versatec for "laser-like" printing long before there were laser printers. You then had to cut and tape/paste your illustrations on separate pages so that it looked like it had all been done 'professionally'. This involved knowing the font sizes of everything, equations were a pain, figures needed to be planned out at the end of each chapter, it was a mess. But it was all we had.

Today, you'd use Word, or OpenOffice, or something, and include the images from some drawing program, and then ship it to a simple color laser and you're done.

So why, oh why, are we back to the days of specifying the intricate details of the rendering of text? The web, and specifically the web browsers, need to take a page from the word processing advances of the last 25 years and get with it. There needs to be a way to make all of this a lot simpler. Oh sure, if it's in the browser, then there are those that say Microsoft will make the rendering in IE of Word-built documents better, nicer, faster, etc. But that misses the point of the web these days. Today, it's not about static pages, it's about dynamic pages, and for that, you're not going to be able to use Word. You're still going to need Perl, PHP, Tomcat, WebObjects, etc. to pull in the data, and get it ready to show to the user. What I'm complaining about is the presentation markup that we have to use.

There are those that say it's all a matter of standards - get the right set of CSS files and everything is easily done from that. OK, there's something to that, but that's no different, really, than my grad school days. I was lucky to have the benefit of many other people making these troff templates, but I still had to do more typography than should have been necessary - or than is necessary today.

So we need to move the web forward with the lessons learned from the Word processor wars... get a format that does more than text layout - get page layout and make it a lot easier to use.

Silly Parent Tricks

Tuesday, May 29th, 2007

As a parent, I have dome some pretty crazy things... I've been sprayed with pee... I've caught poop in my hand to keep it off the floor... I've been kicked, slammed, thumped, and pounded by kids - but up until this past weekend, I've never shot myself. This is one of those stories that reminds me that the most important thing to keep as a parent is your sense of humor.

Anyone who's watched A Christmas Story knows that Dads often think about the (unsafe) toys they had as a kid when their kids (boys or girls) want to have a similar unsafe toy. As a parent, it's you'll shoot your eye out, but as a kid it's just the ultimate in fun. In recent weeks my son, Joseph, has been asking for an Airsoft gun to play with in the back yard. Since it was a slower muzzle velocity than the forbidden BB gun, it seemed like it might be a non-messy paint-ball gun. So we had a look.

The guy behind the counter at the Bass Pro Shop said that it'd leave little bruises, but not puncture the skin. That if we wanted to make sure, we could go home and shoot ourselves to see what it really felt like. Seemed like a good plan at the time. Ah... silly parents...

I unpacked one of the guns when we got home, loaded it up. Looks nice... non-threatening as it's translucent, but it's got good weight and the clip is nice in that it can take 15 shots and then the child in question has to reload. Better than the old days of dropping in 500+ BBs to my air rifle, pumping it up 20 times, and punching through metal. But I digress... I then took the gun to the garage as I didn't want my neighbors seeing me shooting myself in the backyard - that'd bring a 911 call for certain.

So in the garage, I'm sitting on the steps to the garage door with my elbows on my knees. I'm looking in the recyclables thinking it'd probably be a good idea to shoot something hard and just feel the impact energy before actually shooting myself. So I find this nice, sturdy cardboard and think the guy at the store said it would not break the skin, this is pretty tough, should do nicely. Ah... silly parent...

So I put the muzzle of the gun to the cardboard and calmly pulled the trigger.

YIKES!

The little yellow plastic ball had gone clean through the cardboard and tagged my left leg! It stung, but not as so bad that I jumped. Then again, it had already slowed down while plowing through all the good, strong, cardboard. For a second, I looked at the cardboard with disgust, as if it had let me down. Then I quickly realized what had really happened and looked at my leg. A bruise was already appearing - shaped like a donut, or 'O'. Interestingly, nothing in the center was bruised, only the periphery. It didn't raise a bump. It stung for about a minute, and then it was just a mark. No lingering pain.

I walked back into the house, showed my wife and the kids. They were surprised and a little bit scared of the guns. Good. Healthy respect, that's nice. It didn't hurt any longer, but it was a funny story to tell. So I tell this story so that other parents can spare themselves from shooting themselves with one of these guns. They are not safe to shoot at each other. But they are probably more safe than BB guns. Lesson learned.

Master/Slave vs. Peer-to-Peer

Friday, May 25th, 2007

I've got an interesting problem facing me now. I now have multiple servers - on different continents, and it would be nice to be able to have them act as one unit. This isn't a requirement of the project, the requirement was simply that we have a BCP (a.k.a. disaster/recovery) site for the server, and that's done. But there's a step that we can take, if we want to, and that's to make the system better because of the existence of the other server - improve response times, share the load, etc. The problem is in how?

The first obvious solution is to do nothing. Not very creative, but it's the lowest-cost solution and has to be considered given that we have limited people and time available to us.

The next might be some kind of master/slave set-up. Something where all the data still comes into the main server, and then after it's done verifying it, processing it, something is sent to the other one to keep it up to date. This has the benefit of being the only solution where we know that the second (and third) server's state is - because we set it. No inputs other than price are allowed to this read-only, slave server. This keeps things under control nicely. However, the downsides are significant:

  • Plenty of development needed - we're going to have to come up with multiple protocols that will be used to transfer the data from one to another with handshaking and acknowledgments so that we know what we sent was received successfully, etc.
  • Significantly more bandwidth needed - if we're going to be moving the complete state (as it evolves) from one machine to the other, we're going to need a lot more bandwidth between sites to keep the latency on updates low. This might be a big issue and then again, it might not, but if we do this, we certainly need to keep this in mind.
  • Still going to need verification - no matter what, we'll have to have something that will verify that the slave is indeed a copy of the master. This might take the form of daily script, or it might be an hourly check, but something or someone has to check them against one another.

The other solution I can think of is some kind of peer-to-peer system where a change entering one gets sent to the other prior to it actually being checked and acted upon. The most obvious extension of this is to have all the changes come into the one, Chicago, server and then have it send the data to the other(s) so that we're pretty sure that we don't have to worry about cross-site changes, which could be a drag. Still, the problems with this solution are significant too:

  • Loosely coupled == More differences - no two ways about it, if the systems are loosely coupled, then it's very possible that the data in them will be different. This might also happen in the master/slave, but it's going to happen a lot more in the peer-to-peer.
  • Support Costs - because of the first point, this solution almost guarantees that we'll need someone in the right timezone to make sure the differences are kept small. It's a person at this point - not a script.

I've sent out these ideas and trade-offs to a few of the folks here that would be most effected by the decision. I want to hear what they have to say. My initial impression is that I should have the second server running every day, taking ticks, ready to cut over at a moment's notice, but that linking these guys is a lot of work that isn't really needed, and won't necessarily pay off in the long-run.

Separating the People from the Work

Friday, May 25th, 2007

I'm sure there must be hundreds of Dilbert strips about this one fact - the people you work with need to be treated as totally independent from the work that you do. There's no two ways about it. If you confuse the two, most often times, you're going to pay for it. And unfortunately, it goes both ways... think your boss is your friend? He probably is... until the budget cuts come and he has to make the tough decisions. Think your boss is your mortal enemy? He might be... but it's your work that makes him look good. So either way, it's important to remember that the work is what you do, and the people are just those that you share space with.

This came up in the recent days for me when a friend talked about the grief he was getting from the folks he worked around. No two ways about it, these guys are not nice. I can remember dozens of times when I'd be in a work environment and find out that people found it funny that I was as passionate as I was about the work. So much so, that in one place, I found out they were taking bets as to who could get me upset about something the quickest. Nicely done, guys... Nicely done.

You can't control these people. If they are going to be insecure enough to see your passion and commitment to the work as a threat, and therefore have fun poking you in the side, well... there's really nothing you can do, and doing anything just makes them feel more important as they were able to draw your focus away from the task at hand. So you have to let them be exactly who they are. Maybe they'll grow up, but most likely not. Not your concern. Life goes on, and there are enough things to worry about that this doesn't need to be one of them.

Housekeeping Day

Thursday, May 24th, 2007

Tonight I have releases of several components of a large app that I deliver, and since multiple pieces are changing, it's a good idea to make today a slow day so that I don't move the codebase in case something comes up regarding the releases.

I spent time helping track down a few issues... configuring a few boxes... doing my part to increase the Corporate Red Tape... all things that needed to be done, but have been put off as they aren't that important. But today was their day.

The Commute vs. The Commuters

Wednesday, May 23rd, 2007

Each day I have an 80 min commute from my house to my cube, and it's not bad. Don't get me wrong, I can remember walking to my office in Grad School in 5 mins - but given that I work in Downtown Chicago and would not want to live within 5 mins of where I work, it's a decent compromise.

Each day I have an 80 min commute from my cube to my house, and it's horrible. It's the same streets, the same train, the same car, the same roads - but different people, and as Robert Frost might have said that has made all the difference.

It's the commuters in the evening that make the trip nasty. In the morning, it's early - and I mean really early, yet most everyone takes their turn, waits in line, doesn't push, and we all get on the train quickly and get to work. In the evening it's pushing and shoving... angry faces... loud talkers on the train - and then we get to the traffic.

In the morning, I almost have the road to myself. Nice and quick, no problems. But in the evening it's a traffic jam from the word go. First, it's all the people that think a Stop Sign is really a way to get the jump on honest saps. They fly right through 4-way stops like the know the rest of us are going to stop. Then there are the busses... at one specific intersection near the train station one bus will play blocker for as many busses that don't have the right-of-way as necessary. I've repeatedly watched one Pace block the road for eight other of it's ilk - and then not allow one car to go after the last bus. It's like a little club - Here you go, my buddies... all of you can go - but not you, car!. It's something that I know they aren't going to change... they don't care. But if the drivers of the busses were in their cars, they would not do the same thing - Oh, I like these Jeeps, they can go - but not you Fords!

The problem is that people seem to think that they are special. They deserve to get home 30 sec. faster than the rest of us. We must not have any place to go, which is why they don't feel bad about cutting you off, sitting in the middle of an intersection, and in general doing such poor driving they'd never get a license if it depended on their daily driving habits.

I know this is just ranting... aggressive drivers in Naperville are nothing in comparison to Detroit or Downtown Chicago, but it's funny that these same people don't see themselves as being aggressive on the road. They probably think of themselves as nice, kind, caring - even sharing, people. But don't you believe it. They're not that far removed from the lower species.

Locking Multiple Objects

Tuesday, May 22nd, 2007

There are times when I wish pthreads had a two-pass read/write mutex. I guess I should be happy that it's got the read/write mutex and I can make something like a two-pass lock with retries. Basically, if I want to lock n objects in C++ using pthread mutexes, I have to try_lock the first one, if it fails, optionally wait a bit and try again. If it succeeds, then I can try the next one, and if it fails, back out the ones I've received, otherwise try the next one, etc. This scheme is nice in that it allows me to make the functionality I need, but it'd be great if pthreads had the ability to prepare_to_lock and then lock so you could make the two-pass system that would make it possible to make sure you could get everything you needed before locking anything up. My solution locks and unlocks, which can be a waste if one of the objects is write-locked, for instance.

But I suppose the fact that I can get most of what I want with a very little bit of code is nice. But it'd be great if they added that. Or maybe BOOST could make that out of other primitives.

The reason for this is that you get deadlocks when you try to lock multiple objects in a single, atomic unit, and that's what's been happening in my code. I have one section where a family of objects are locked for read, and another where a similar set are locked for writing. The problem I've hit is that the timing of the multiple threads in the system is very hard to pin down, and I don't want to use exclusive locks (mutexes) which would make things a lot easier, but would lock more than I want, and slow down performance in the server. So, I am looking to use the 'tryLock...' in the read lock part of the code and by putting in the 'try' get rid of the deadlocks. The only problem is that if the 'try' fails, the clients will have to retry, but that's already in the client code, so that might not be too bad. If I can get rid of these deadlocks, I'm making a big step forward for stability. We'll see how the tests go.

Never a Dull Moment

Monday, May 21st, 2007

This morning one of my servers started sending out prices of nan, which is clearly not the right thing to do. This caused my downstream systems to stop ticking, and that caused a lot of production issues. First off, I had to find the problem, and patch the downstream systems so that they would be effected by nan values anymore. For the most part, they were protected, but from this one server it was far far too optimistic. So I had to batten that hatch down and get those changes into production as soon as possible. Then I could turn my attention to why?

The why? turned out to be a volatility curve that was primarily short-term, and rising significantly at the end - coupled with a far-term warrant that ended up with a very large volatility at it's expiration due to the interpolation of the curve. The fix was to note what the problem was and then to put into the code a configurable parameter that would 'cap' the volatility at expiration and not allow the options to be valued if the volatility exceeded a given cap. This works great, but I tell you... there's nothing like coding a fix for a system with 50 users breathing down your neck. It makes you make sure you're right - and fast.

BCP (Disaster/Recovery) Work

Friday, May 18th, 2007

Certainly one of the more interesting things I've ever worked on is the Business Continuity Planning or BCP for short. It's called a ton of things in different shops, but it's always the same thing - What happens if the building is wiped out?

Where I'm at now, we have sites around the world, so the obvious BCP plan is to make one of the other sites the BCP site for the main production facilities. The easiest way to do this is to have the secondary site active at all times then switching over is as simple as redirecting a few clients, etc. So, that's what I'm doing. Now, I've known that this needed to be done for months and months, but the management never really gave me a lot of time to do it. Now, with audits looming in the future, it seems that now is the perfect time.

The interesting things I'd like to put into this BCP plan are really more like hot fail-over plans so that we can tell that the plan will be easy to execute because we can see that the hardware and software are functioning every day as they are intended to. However, that means that I need to do a lot of work to get things to this point. That's been my day today. Nothing fancy - just a lot of details that need to be looked into, and done to get the systems up in the other locations. It's certainly not glamorous, but it's what needs to be done.