Archive for February, 2010

The Pain of Being the Engineer on the Train Wreck

Monday, February 22nd, 2010

Today has been a particularly sad one for me. It started out wonderfully, but soon after noon I was talking to my manager about the upcoming work I needed to be doing, and while it's normally an interesting feedback loop, this time it was just plain depressing.

You see... I'm the engineer on the train that's now purposefully headed for disaster.

It may not be certain doom, but it's very bad ideas, and rather than listening to someone with nearly as many years of experience as his age, he's decided that this is what he wants. I've tried to dissuade him. He's not interested. He wants what he wants, and I can appreciate that "gung-ho" kind of attitude, but I've been down this road before, and I'm not really convinced that this is the right thing for us to be doing.

But it's not my decision. It's his.

So I'm sitting on a project that I have no sense of confidence in. None. It'll be a mess no matter what I try to do because it's just too much data. But he's convinced that there exists a way for me to pull another rabbit out of my hat.

I've spoiled him. Spoiled rotten. I've pulled off too many things like this for him to think of the "reasonable". He goes straight for the "unreasonable", and will settle for the "impossible" if I fall short.

It gets a little old.

OK, a lot old.

How Not to Start a Nice Weekend

Friday, February 19th, 2010

I should never start working on the Hemlock project at noon on a Friday. I should have known that. But silly me... I wanted to get these changes in the code and checked in so that I could put it all behind me for the weekend.

I should have just taken off and dealt with it on Monday.

I was stupid, and arrogant to think I could get it done in half a day.

Hemlock is this horrible project that was dumped on my by a very bad developer when I arrived. It's a mess, a complete mess, and yet it works, and I can make it do more things than it was originally designed to do. That's my fault.

I should have faked incompetence.

So I got the real code fixed up in about 2 hours, but then I couldn't finish up the JUnit test modifications for those changes in the following 2.5 hrs. It was (and is) a nightmare. These aren't unit tests - they are user validation tests - without the user. It's exactly what JUnit can be abused to do and strangle your development effort.

But they are there, and that means I have to update them.

And ruin the start to my weekend.

Lovely.

But it's all my fault.

Designing for Continuous Operation

Thursday, February 18th, 2010

Today I had some problems with the messaging software we use at The Shop - 29West. To be fair, the problem wasn't completely 29West's, but it was due to the architecture of 29West. Even in pub/sub mode, 29West (non-multicast) is still a point-to-point protocol. This is a feature in their design. I can see it - take out the middle-man and centralized bottle-neck, and you can increase performance. Makes sense... I've used point-to-point socket protocols a lot and there's a ton of good in them.

But when it comes to a service, that's not a really great model to use. Reason being: if it goes down for any reason, then you have clients that are unable to connect to this "pub/sub" system, and that doesn't make a lot of logical sense.

I mean, a subscriber should be able to subscribe unless the complete infrastructure is down - not just the publisher. It's a matter of opinion, I'll concede, but I think it's an opinion that should be shared by a vast majority of the systems developers in the world. The entire point of pub/sub is to separate the 'pub' from the 'sub'. Having it be point-to-point breaks all that to pieces.

Which brings me to the point that seems to be clearly lacking in The Shop - the thought for continuous operation seems completely absent. Everyone plans on nightly shutdowns, and as such, never puts even the slightest thought into making their system work all week long. "There's a nightly restart, so I will to." - that's just lazy thinking and lazy building.

I'm not saying there aren't reasons for some systems to have nightly outages. What I'm saying is that if there's a reason for your system to have one, then go ahead and have one. But if you don't need it, then don't take it. If you do this, and the other systems that do need it figure out a way around their limitations, then you have a more robust environment as nothing needs to have a nightly restart.

So stop accepting other's limitations as yours - but better systems than that.

Twitterrific 3.2.2 is Out

Thursday, February 18th, 2010

Twitterrific.jpg

In an unexpected, but nice, update, Twitterrific tweeted that 3.2.2 was out with a few fixes and some commercial additions. It's nice to see the Mac desktop version getting a little love, especially given that there have been numerous updates to the iPhone version recently.

So I had to get it and give it a whirl. Works great.

An Incubator of a Different Sort

Thursday, February 18th, 2010

I've had a truly eye-opening experience this morning, and unfortunately, it's not been a good one. This morning an application I had been given when I arrived to support/extend had some problems connection to a date server - something that will tell me what the business days are for a given trading exchange. Pretty standard stuff in the industry.

My next step (after restarting the process and getting it working) was to contact the group(s) responsible for the service and ask them if there's an obvious reason that the service was - well, unavailable, during normal business hours for the location. What I got back was pretty much what I've come to expect in The Shop - "Steve is handling support for that today - he'll be contacting you soon."

Point #1 - Take Ownership, Don't Pass the Buck

While I say I expected this response, it's only because I've become accustomed to hearing it from so many people in this actually small-ish Shop. Face it - there are less than a dozen people in a group, why have a specific support person? Why can't anyone, and everyone, in the group handle this? If it's a production problem someone has to handle it, but why make it a specific person?

This bothers me because it makes me feel no better than a purchaser of this product - which, by the way, I wouldn't pay anywhere near their salaries and support costs for. It's not worth it - dollars and sense, but because we're a single Team, it makes sense to work together. However, that's totally thrown out the window when I'm treated like a phone caller with a problem on their washing machine.

Have more sense of ownership and respect for your customers. Enough said.

Who's Sherlock Holmes today?

When I got an email from Steve, he asked me for a few things I expected: my 29West configuration, and the location of the service I was trying to connect to. Well... I could give him the one, but the other made no sense at all. The point of 29West discovery is that I don't have to know where the service is running! So after a few more emails, I respectfully said that maybe he could check with the other members of his team to find out where their services are run.

What is this? Who's Sherlock Holmes? I'm already being treated like an annoying paying customer, but not given any respect, and now I'm being asked to help them find their production boxes? I can't believe this.

To this point, I asked my manager if he could imagine me not knowing what boxes my production services are running on. He was, understandably, silent.

Point #2 - Have Support People Know How to Fix Problems

As silly as this sounds, it was so clearly absent in my exchange today. Sad.

Resolution?

Late in the day Steve got back to me with a few changes in the 29West config and the name of a new 'environment' to use for tomorrow. I'm hoping it works, but I have no reason to believe that this information is any more reliable than his not knowing that I couldn't tell him what box his service was running on, or any of a half-dozen questions he shouldn't have asked me during the course of the day.

I'm just going to have to hope that he knows this. I hope the production users understand if it doesn't work tomorrow.

An Incubator of Laziness

I came to work in this Shop because during the interviews I believed what I saw was a company that was looking at technology and development the same way I was - lively development, quick releases of good code, possibly even a blistering pace of innovation. But I realize now that this is what I was lead to believe. Not what the reality was.

More than anything else, this place, and the success it's seen, has allowed an enormous sense of laziness to prevail. There are dozens of little groups - each used to be a person or two but as the business grew, more hands were needed and then a "do-er" became a "manager", and sat back getting fat-n-happy, protecting their new Empire. And all for what?

As it happens very few child actors successfully make the transition into adult actors. So it goes with companies. Some of the most successful small, growing companies can't survive their own success without shedding a good bit of the old guard that got them there. It's sad, but those people were exactly what the company needed at the time, but equally as toxic to the organization as it grows. The devil-my-care attitude... the willingness to get their hands dirty... to take charge... doesn't work when you go from 50 people to 500. You need different people with different skill sets.

There are some that make the transition successfully, but they are few and far between. Most find they need help, and get it. Others refuse to see the obviousness of the situation and their company dies, or shrinks to a manageable equilibrium.

I'm not sure what's going to happen in The Shop. I really don't know. But I do know that what is happening all around me now can't continue. It's just not possible.

Testing Out a New Google Visualization

Wednesday, February 17th, 2010

GoogleVisualization.jpg

I've been using the Google Visualization AnnotatedTimeLine for nearly the last year. It's very nice, but I've been having a lot more memory issues lately, and using the Google Chrome memory profiling tools, it's pretty clear the vast majority of the memory usage in the effected pages is in JavaScript String objects - and most of those are in the AnnotatedTimeLine. Additionally, it's a Flash component, so with the new advancements with HTML5's Canvas, and the JavaScript-based graphing tools, maybe there would be one that I could use in place of the AnnotatedTimeLine.

Turns out there's one that's very similar. It's called "dygraphs" and is located off the main Visualization Gallery page. It's got a nice feature set, and while it doesn't do the "time sliding" that the AnnotatedTimeLine does, it's got a few nicer things like error bars and built-in moving average and even able to put the legend in a separate div. So I decided to give it a try and see if I would get a better memory profile.

Thankfully, the dygraph takes a Google DataTable, so all I really had to do was remove a few event registrations specific to the AnnotatedTimeLine, the constructor, and comment out a few blocks of code, and I had a functioning replacement page. It wasn't optimized for all the features and fonts, etc. but it was working.

Sweet.

I then looked at the memory usage of the two pages in the Google Chrome Task Manager and while the AnnotatedTimeLine version of the page was in the 58 MB range, the dygraph version of the page was on the high-side of 200 MB - it was at least a factor of four.

OK... nice test, but I'll stick with the AnnotatedTimeLine for now.

Yikes!

Acorn 2.2.1 is Out

Wednesday, February 17th, 2010

This afternoon I got a tweet about a new release of Acorn 2.2.1 which improves a lot of the JPEG image handling when flattening the image for web export. No longer is the background messed up, it's white (JPEG doesn't allow for invisible background) and there are a few other fixes. In all, a nice upgrade.

Trying to Pause Background Web Pages

Wednesday, February 17th, 2010

GoogleChrome.jpg

My web app is getting a lot more use in The Shop, and one of the problems associated with that is that folks will open up half a dozen tabs with different pages and then 'flip' between them so they don't have to navigate the menu system to get where they want to be. It makes sense, from a MDI-Client point of view, it's what they are used to. OK. I can deal, but what happens to their boxes isn't so nice.

They slow to a crawl.

It's because the 'hidden' pages are just as active at getting data as the 'visible' ones are. This isn't a great use of the machine - and as importantly, maybe the 'hidden' tabs could run the JavaScript garbage collector while paused so as to clean up their memory footprint.

At least that was my hope this morning.

I wanted to come up with a way where the JavaScript in my pages would know when they are 'visible' or 'hidden' - and on the latter, pause the data updating. When a 'hidden' one becomes 'visible' it would automatically start updating. What I came up with is close, but, sadly, it's not going to work.

If I have the page's onBlur event pause the updating and the onFocus event start updating it works great as you move from tab to tab on the browser window. But the minute that you move off the browser application, to say, Outlook, the onBlur is fired and the updating stops. The same would happen if they had multiple browser windows with multiple tabs in each.

The 'focus' is really too specific to do that I want. Indeed, the onBlur is a little too much as well.

What I thought about trying was to use site cookies that would be used as semaphores between pages and windows of pages to say who has the 'update focus' for the specific window. The problems started popping up when I had to identify the window the pages were in. I might use the size and location of the window, but then if the user resizes it, or moves it, then all the state data is messed up and has to be maintained at considerable cost (every move event, every resize event has to go through and update same).

It might work, in theory, but in practice, I saw it as just too difficult to implement well. Too many holes and problems, with associated overhead.

So then I decided to test the theory that a paused page will do a garbage collection pass and reduce the memory footprint. So I pulled up the Google Chrome Task Manager and paused a page of mine:

Time Memory
2:32 pm 88,844 kB
2:57 pm 88,892 kB

so after nearly 30 mins of no activity the memory grew? Crud. This is something I should have checked before I did all the work on the event processing. It was a simple assumption I made and it could not have been more wrong.

Live and learn... gotta do that, at least.

The Evolution of Advanced Web Graphing

Tuesday, February 16th, 2010

I've been working with the Google Visualization AnnotatedTimeLine widget, and it's nice - no doubt about it, but it's still a Flash-based widget. It's nice to know that the Table widget is now all CSS - it was originally a Flash component, but the AnnotatedTimeLine is still Flash, and it's not as snappy as it could be.

Sure, the AnnotatedTimeLine might not be meant to do the kind of graphing that I'm doing, but it's close. I've been missing the level of completeness that VantagePoint had in Java, but that's not really an answer, either - I can't go back to Java plugins and all the problems we had with that back at my old place. But still... there needs to be something better.

Normally, I'd say "Google will get there - eventually", but this is really something that needs to be done. The HTML 5 Canvas object has allowed quite a few implementations of nice graphing packages, but they aren't focused on the scientific plotting that I'm looking for. They are really focused at the business presentation graphing. Nice backgrounds, fewer data points.

What I need is something completely new and original. I need to have something that doesn't force me to reload all the data from the source - that's my first real killer problem. If I could send incremental updates, then I could stop sending these enormous JSON strings to the client for parsing. I'm not positive, but I'll bet that the memory problems I'm having now are a direct result of that.

Still, there's nothing I can really do - save start to write one on my own. If I want to leverage the work Google has started, then I just need to sit tight and hope that I'm not the only one that has this same set of issues. I'm probably not, but then again, if this is a Google "20% Project", it's unlikely that the developers will think what I need is really what they need to write. And there's no way I'm getting the source code, that's for certain.

VantagePoint is orphaned now, and that's a tragedy as they might have been one to move into this space, but alas, that's not to be.

It's just frustrating to know what needs to be done to make this a really stellar package, and not be able to get in there and fix it. But we all need to learn to live with disappointment.

I Wish HTML div Elements Didn’t Force Line Breaks

Tuesday, February 16th, 2010

Once again, I was bitten by the annoying feature that div elements in HTML have forced line breaks in them. I was looking at the JSON output of a servlet I had created and was trying to figure out why there were CRLF line breaks in the data stream for one column of data but not another. I tried calling trim() on the data, and the results, and still I got the line breaks. I was really stumped, and then I looked at the addition of the CSS class to the strings and noticed that I had used a div for the one column and a label for the other!

That was it. Duplicate the CSS on a label tag, switch the div to a label and all of a sudden I was getting the data I expected.

Duh!

But Holy Cow! Who decided that div elements needed line breaks when the others didn't? Clearly someone with an evil sense of humor.