Archive for July, 2007

It’s Hard Supporting the Globe

Tuesday, July 31st, 2007

Today has shown to me how annoyingly difficult it is to support an application around the globe when the folks on the other side of the world work regular hours. Face it, it means that I have to work unusual hours and long ones at that, and much of the time, there's a day delay because their day ends before mine starts, and starts after mine ends. Very annoying.

On the upside, it's because I want to turn things around quickly that it's annoying. If I was content to let things progress slowly, it'd be a nice change to the day ..OK, sent that email and have a day to wait for a response..., but that's not what I like to do. Additionally, the problems of having people trickle things to you means that you never really get the full picture until you have gone round and round about it several times and find that nothing is forthcoming. Give it to me all at once and then we don't have to wait for these horrible delays.

But then again, maybe that's exactly what they are planning on. Nah... that's too well thought out. The things bugging me now are simply because they don't think more than one step ahead of their own existence. Crud.

Major Retooling for MarketMash

Monday, July 30th, 2007

Well... today was the first day of the real re-tooling of the server for 24hr operation. Up till now, it's been restarted each evening, but with the needs of the Hong Kong group coming so close on the heels of the end of the Chicago trading day, it was time to look at a more continuous Server. The first thing I needed to do was to extend the protocol between the server and it's clients to allow for something other than the 'instrument update' notification. I had to add in the capability for the server to send a 'remove instrument' notification when a new underlying is reloaded and the list of derivatives do not match those that used to be on the underlying, and so there's an instrument that has been removed from the server and needs to be removed from the clients as well.

This will happen when we have an expiration, or a split processing, where the strikes or expirations have changed and therefore the options will have changed. Right now, the server clients don't have any mechanism to remove instruments - the users have to zero out positions in the server, and then place new positions in the server. It's a hack, but born out of necessity. Today we got the instrument removal on a reload working so that the splits processing will be much easier and setting the stage for the nightly regional roll-overs where instruments will be dropped as a natural course of the week.

I also spent a little time updating the project web page to put in the overview of all the tasks I have planned out for the remainder of the project. It's a lot of little things, but thankfully, nothing seems to be too difficult that can't be solved in a day or two. Oh, it'll still take a month or so to get all this done and tested, but that's plenty of time before the DST switch this fall.

Getting Things Planned for 24hr MarketMash

Friday, July 27th, 2007

Today I got news that the plans I'd had for having MarketMash cover the Hong Kong user's open reliably were dashed because of the pokey nature of the Chicago folks in getting their trades into the systems. So I had to go back to the plans for a true 24hr Server and finish them. The first ideas I had for covering Hong Kong were to have the server reload itself without shutting down. This would be similar to the reloading all positions I'd worked out a little bit ago - something that used to require a recovery now is done in less than a minute. But the problem turned out to be the timing.

The traders and operations don't get done putting in trades until nearly 5:30 pm, Chicago time. This meant that even if I had a file ready for me then, the earliest I could have had it going was 5:31 pm, which is 7:31 am Hong Kong time - depending on the DST shift. So the hopes of doing it at 4:00 pm were out the window.

The next idea was to have the instruments classified into 'Regions' like ASIA and OTHER/LAST where they are in these regions based on the geographical region and the time we got marks and end-of-day positions set for them. So, late in the afternoon, we'd roll-over the ASIA region to tomorrow's business day and reload the marks and SOD positions. Then, a few hours later when the rest of the marks and positions were set, we'd roll-over the LAST region and everything would be ready for the next day. In the interim time, part of the instruments would be on tomorrow's date, and the others would still be on today's date.

This means that we'd have to make the server's clients (the middle-tiers) smart enough to know what region they are in and how to deal with the different regions and time of day. It's a bit of a mess, but it's the best solution I can think of, because the only other solution I can see is to have multiple servers with the same problem as nothing is going to be completely ready 24 hrs a day. That's just not the way this place works.

So today was spent doing a lot of thinking about the edge conditions and what would need to happen to each system for each case. Once it's all laid out, I'll update the project web site and the others can see what needs to be done and by whom.

Environmental Evolution

Thursday, July 26th, 2007

I guess that's really redundant. It's because of the environment that we evolve, but it seemed especially true when the environment itself is evolving due to the changes that are happening. Very cyclical. Interesting.

This afternoon Rock Star programmer was asking some totally left-field questions, and I answered them because I knew the answers and I didn't want to appear to be insensitive and not answer him. Then other conversations sprang up, and in the end, it came back to the idea that Rock Star thinks that there are some broken things that need fixing. Now I'm not one to stand in the way of solid improvement, but if things are working - even just mostly working, then there just might be a reason that they are being handled the way they are. For instance, it could be silly, but the way things are being done may be a little broken due to audit or compliance rules. Nothing we can do about that, but if you don't know there's a reason for a certain action or process, then it could appear to be at least inefficient, and in that case, old Rock Star sees inefficiency as a reason to act.

Again, I have to admire his heart. It's in a decent place, he's trying to fix things that, for all he knows, people have been complaining about for months. Problem is, Rock Star doesn't bother to actually find out if there's a reason, or if people are really bothered. He sees the inefficiency and treats it as if it were horribly broken, and wants to attack the problem.

This, understandably, puts people off. For instance, if you have a way to move a gallon, and once a week you need to move two gallons, it's annoying that at that time you have to make two trips, but it's not as if you can't move any water - just not enough to make it in some trips, some of the time. There's inefficient and then there's broken. It's important to understand the difference.

Unfortunately, when we try to explain this to Rock Star he counters with the classic argument: ...why are you all trying to stop me from doing my job? I just want to improve things. Indeed. How can one possibly know what needs improving until one understands the reasons for the current process or problem? It may very well be as efficient as possible - given real-world limitations of money and time. Again, I'm not against improvement, but trying to fight for a 1% improvement in something that's basically working and leaving some other things totally undone is just not a good way to strike a balance.

And like it or not, business is about trade-offs and balance. You can't spend a man-year on automating something like a letter opener. It's about as efficient as it gets. You just have to accept that you have to manually open letters you get in a day. Now if we received millions of pieces of mail, that'd be a reason to improve it, but what we get? It's just not worth it.

Understanding that there are priorities, and always will need to be priorities, is important to being a good developer. You have to understand how much to do, when, and what the payoff is.

Getting Backups Going at Home

Wednesday, July 25th, 2007

With the fixing of the Mac Mini at home, it seems like a really good time to get the backups going there for the kids laptops and for Liza's as well. The latter is going to be a pain as it's a Windows laptop, but I'll try to see if there isn't something that I can find to do that without driving me insane in the configuration process. Anyway, the first thing I wanted to do was to check up on the backups that were happening on frosty, an iMac G3 server in my office.

I looked at the drive used for backups and noticed that it was 53% full! That didn't seem right, so I checked on things and sure enough, my script wasn't deleting the old backups, it was leaving them around. I spent a few minutes clearing out the old backups in the directory - keeping the last week as that's the rotation plan. Then I spent just a few minutes noticing why the script had failed (I copied it from my CVS backup script and it didn't have different users to deal with) and fixed it up so that it deletes the week-old backups right before it makes today's backups. Looking a lot better.

Next thing is to get the kids machines to mirror their accounts to frosty so that when the backups are done, they backup the mirror and we have duplicate redundancy in the system - the user accounts on frosty and the backups of the accounts on frosty. I like this plan a lot. At least until Leopard comes out, and then with Time Machine I'll have to re-think all this and get those backups going in whatever way Time Machine allows.

I've always been partial to the unix tools, and I've used rsync and rdist in the past so it seemed like a nice way to get the kids machines mirrored to frosty. I found a nice article that talks about setting up rsync on Mac OS X and walks through the configuration of the SSH key files, etc. This is nice because I haven't generated key files in years, and it's always something I have to go back to the man pages on to get right. This makes it much more convenient.

In the article, there's a reference to using launchd to automate backups when attaching an external drive. I'm not so sure I need that, but it's an excellent use of launchd and I just might find a use for it someday. No, I think it'll be easy enough to just cron the backup script on the kids' laptops so that they kick off in the early evening and mirror themselves to frosty. With rsync only the changes are going to get sent, and that'll save a lot of work as most of the files on the kids machines are not going to change that much, and so the bandwidth on the wireless network will be minimal. I still like the full backups of the accounts on frosty, but that's a drive hanging off the machine - no big deal.

So, this weekend I'll spend some time getting the rsync scripts on the kids laptops going and make sure they work. I'll also put the same kind of backup script for all user accounts on squirt, the Mac Mini that everyone seems to use from time to time. That'll make sure that we don't loose anything from that guy, either. It's a good plan. I'm excited about it, and it'll be really nice to get it all going.

Unsettling Times

Tuesday, July 24th, 2007

With the resignation of the CTO, and for the time being at least, a little uncertainty as to what will happen with his replacement, I feel there are unsettling times ahead. It really all depends on how they handle his replacement and how close a fit the replacement is to his management style. No one is perfect, and so there's always room for improvement, but this guy was so instrumental in setting the tone of this place that I'm wondering how a more button-down collar type so change the environment that the result would be something so completely different to make it almost unrecognizable.

I can certainly see that some folks would like a more traditional CTO - one that kept lists of deadlines, kept the feet of all IT folks to the fire. There's certainly a place for accountability, and making sure that a promised deadline is met or a really good reason is given as to why not. But to go to the other extreme would so adversely effect this place that I'm wondering if they can really afford to do it. Asking a group of people who have worked a certain way for years to change significantly because of a new manager is their right, but I hope they are a little more realistic.

No doubt, there will be folks asking for a more technically savvy CTO. Again, not a bad idea, but the technology is such a small part of really getting these big projects done. It's more about motivation, people skills, and really only a very little about knowing the underlying bits in the systems. Again, a CTO that doesn't understand the problems is a problem, but one that can't motivate the troops is as big of a problem and causes retention issues, etc.

I guess I'm just nervous. After all, they didn't ask me about it and I seriously doubt that they care how it will effect me. I, on the other hand, care a great deal how it is going to effect me.

Back at it… Again…

Monday, July 23rd, 2007

Well... I'm back from vacation and it's a bit of a challenge getting caught up to things, but handling email from home made things a lot easier - at least on that front.

The exciting work I was doing on the coding serendipity right before I left for vacation took most of the day, but boy was it worth it. The brute force method of clearing out the positions and reloading them from the files was taking a lot longer than I had planned. Like minutes versus seconds. But with a little work and thinking about the problem carefully, I was able to get then entire process down to about 20 sec. It was very nice to see it working just as I had hoped. This is going to really save time when compared to the 20 min. recycle time if I had had to restart the server.

On a sad note, the CTO resigned while I was away. He'll stay on for a while, but he has always been a real asset to this place. I have to admit that I'm more than a little worried that things won't be as nice around here without his steadying influence, but there's nothing I can do about it, and I know that in the end, things work out, and life goes on.

Coding Serendipity

Friday, July 13th, 2007

This afternoon has been one of those really special moments where you know you're on the right track. I was talking to the Data Team earlier today and asking them if there's anything that they can think of that'll make the server's suite of tools easier, and they came up with the idea of automating the few tasks now done manually to execute a split. One of the first things they have to do is to zero out all the positions on all the instruments of the family. This really is a bit of a limitation on how the server's clients act - based on the fact that the server never "looses" a position.

So I started working on it, got it in the server, got it in the Java protocol, got it in the perl module calling the Java code, and then was putting it in the web site when I realized that this was really almost something that I've been meaning to add to the server for months. When something bad happens and the start-of-day (SOD) position file is bad, or the trades are all messed up, we have to correct the files and then recover the server. This is a 20 minute process that the users don't like to see.

The better solution is to enable the server to be zeroed out and then have all the positions in the new SOD file reloaded and then the trades replayed. This would allow us to fix the files and then say "do the BIG reload" from the client and the server would reload all positions and things would not require a recovery and 20 minute outage - it'd take 3 mins tops and the system would never be down.

What was amazing to me is that something almost totally unrelated comes back to be the key component in something that has been back-burnered for months. It's really something. And for a while, at least, I have a great feeling that I'm on the right track.

SCP and HostMonster

Thursday, July 12th, 2007

I've been working on this far more than I intended, but I'm trying to find a way to make it work for me while at work. The problem is that the one factor that is key to this problem is the configuration of the HostMonster server - and I have no control over that at all. I have a few more details, but they aren't surprising: if I set:

    Compression no

in my ~/.ssh/config file, then I don't get any bytes transferred, but if I set:

    Compression yes

then I get at least some data transferred. This makes some sense - if I'm sending it in batches, I send the first batch and unpack it, and then I wait for some handshaking. I'm not getting that handshaking. If I don't compress it, then I'm trying to send the first byte and that's getting there, but with no handshake, it's probably not enough for the OS to write out to the filesystem.

There's nothing specific about this from my Googling, but I have a feeling that it's one of two things:

  • some path from Ameritech to HostMonster is different and messing things up by bad transfers, or something like that
  • an update the HostMonster staff did to OpenSSH

My reasoning for the first is that the path from Comcast to HostMonster does not have to be the same as the path from Ameritech to HostMonster. At some point, one of the pipes could have changed, been reconfigured, etc. and this would be the source of the problem. My reasoning for the second is that the HostMonster box has to be in the mix for a failure, and the box has been rebooted recently so it's not a stalled process, etc.

It's possible that there's something with the routing in the building, but then I'd have problems no matter where I went, so I don't think it's that. Also I know they'd have to go through a lot of paperwork to get any changes approved/done. So it's unlikely that someone 'forgot' about it.

I still haven't heard from the HostMonster technical support guys, and they said 24 hours, but I'm guessing that they are a little behind schedule on this - or are trying to track it down. In either case, I'm really at their mercy on this one.

I'm running a few traceroutes from work and home and I'll compare them to see where the paths converge, as they have to as they both eventually get to the same box. We'll see what happens.


UPDATE: I did this traceroute from work to the HostMonster machine (deleting the times and removing any repeated lines with no information):

     1  192.168.100.1 (192.168.100.1)
     2  192.168.1.1 (192.168.1.1)
     3  adsl-68-22-220-1.dsl.chcgil.ameritech.net (68.22.220.1)
     4  dist2-vlan60.chcgil.ameritech.net (67.38.101.35)
     5  bb2-g7-0.chcgil.ameritech.net (151.164.242.210)
     6  151.164.93.49 (151.164.93.49)
     7  asn2828-xo.eqchil.sbcglobal.net (151.164.249.98)
     8  p5-0-0.rar1.chicago-il.us.xo.net (65.106.6.133)
     9  p6-0-0.rar2.denver-co.us.xo.net (65.106.0.25)
    10  p0-0-0d0.rar1.denver-co.us.xo.net (65.106.1.73)
    11  65.106.6.82.ptr.us.xo.net (65.106.6.82)
    12  207.88.83.102.ptr.us.xo.net (207.88.83.102)
    13  ip65-46-48-66.z48-46-65.customer.algx.net (65.46.48.66)
    14  * * *
     ...
    64  * * *

and I did this one from home to the HostMonster machine (again, deleting the times and removing the repeated lines at the end):

     1  www.routerlogin.com (192.168.1.1)
     2  10.1.1.1 (10.1.1.1)
     3  * * *
     4  ge-1-37-ur01.romeoville.il.chicago.comcast.net (68.86.118.213)
     5  te-3-1-ur02.wchicago.il.chicago.comcast.net (68.87.230.113)
     6  te-8-4-ur01.wchicago.il.chicago.comcast.net (68.87.231.213)
     7  * te-2-1-ar01.elmhurst.il.chicago.comcast.net (68.87.230.105)
     8  68.87.230.254 (68.87.230.254)
     9  68.86.90.178 (68.86.90.178)
    10  68.86.90.181 (68.86.90.181)
    11  te-4-2.car2.chicago1.level3.net (4.71.182.33)
    12  ae-14-55.car4.chicago1.level3.net (4.68.101.136) \
        ae-24-56.car4.chicago1.level3.net (4.68.101.168) \
        ae-14-51.car4.chicago1.level3.net (4.68.101.8)
    13  wbs-connect.car4.chicago1.level3.net (4.71.102.26)
    14  tg9-2.cr01.chcgildt.integra.net (209.63.114.37)
    15  * * *
    16  tg13-4.cr01.sttlwatw.integra.net (209.63.114.69)
    17  tg13-1.cr01.ptleorte.integra.net (209.63.114.98)
    18  tg13-4.cr02.ptleorte.integra.net (209.63.114.142)
    19  tg13-1.cr02.boisidpz.integra.net (209.63.114.18)
    20  tg13-4.cr01.boisidpz.integra.net (209.63.114.13)
    21  tg13-1.cr01.slkcutxd.integra.net (209.63.114.30)
    22  tg9-1.ar10.slkcutxd.integra.net (209.63.114.254)ms
    23  gw0-cust-bluehost-com.slkc.eli.net (70.97.59.22)
    24  * * *
     ...
    64  * * *

So I look at this and don't see them meet up anywhere, so this isn't going to help me in seeing where the paths diverge. I also looked at the details of the ping packets from the two locations to HostMonster - one is 46 hops, the other is 49, both are in the sub-80 msec range. No help there.

I can't look at the reverse paths because they are behind routers - Oh, I could look at the HostMonster to local pops, but when I try that I get Operation not permitted - no doubt for security reasons.

I've tried copying the file from HostMonster to my machine at work and that works great! Once again, it gets more confusing. I can copy from there, but I can't copy to there. Also, I looked and they don't support WebDAV or WebDAV HTTPS - which are both supported by Transmit/Coda.

I've tried running:

    scp -l 10 file dest:

which is meant to limit the bandwidth used to 10kbps, and that sent the first packet and then stalled out. Same behavior as always.

As a final test, I tried using the FTP Manager on the ControlPanel for the HostMonster domain, and even that doesn't allow me to upload the file. There's something very wrong going on. And depending on the data you look at, it appears as though every component is working. But clearly, I'm not looking at the tests correctly.

Reports of a Touchscreen iPod

Thursday, July 12th, 2007

Touchscreen iPod sourced, dated for August?: "Apple has chosen Wintek to supply touchscreen panels for an upcoming video-capable iPod, say sources in the Taiwan supply chain. The local electronics maker, which produces small LCDs for cameras and other handhelds, is reportedly set to ship capacitive touchscreens without specific software controls or integrated circuits to accompany them, allow..."

(Via MacNN.)

All I can say is Wow! This is the iPod I've waited for. If they let loose on this one, and it's got at least the storage of the current models, then I'm going to be getting one the first weekend they come out. Far, far, too cool to pass up.