Archive for June, 2007

Java Allocation Speed

Tuesday, June 5th, 2007

java-logo-thumb.png

One of the developers using some objects in a Java library I wrote and maintain came to me to ask why this one operation was taking as long as it was. It's basically a table objects and views that can be placed on this table (and other views) to "stack up" a "deck" so that the end result is a table that has just what you want, in the order you want, etc. After talking with this guy, I realized that there wasn't a good reason that an aggregation on top of an aggregation was taking longer than the first aggregation. I mean the data set was less, and so it should, in theory, take less time. But it wasn't. And not by a little. So I decided to dig into it.

The first step was to build a test frame for this kind of environment. You see, it wasn't effecting small data sets like it was effecting the larger ones, so I built up a 100,000 x 100 table, and then aggregated it to 10,000 x 100 and then to 100 x 100. What I saw was that the first aggregation took about 13 sec. and the second one took about 8 minutes. OK, this was a good test case, and so I went into profiling mode to find where the time was really getting spent.

The first thing I thought was that the rows and columns were being inefficiently accessed by linear searches of their labels. But after putting that code in (it was in the base table, which is why I thought the first one was faster), it turned out that it didn't really improve the speed a lot. It went down to 5.5 sec and 3+ min. Better, but not nearly good enough.

Then I looked at some of the individual operations and what was there blew me away. In one method, I was returning the column headers as a Vector of Strings. The implementation for the base table was to return the ivar that was the Vector of Strings, and to have in the comments on the method the warning that this is the reference to the storage of the column headers, so mess with it at your own peril. In the aggregate view I had the code making a copy of the Vector and returning that. This was a hold-over from several of the views where the underlying table's columns can change and the system needs to augment the column headers on the fly.

That was the killer.

By creating a new Vector of Strings each time the row was accessed it took so much longer than simply returning the ivar that when I changed it to use an ivar the times went to 5.5 sec and 0.5 sec - they're going in the right direction now! I was amazed at this, but then I started to think about it. Java's allocator is probably doing a lot more than a typical C/C++ copy constructor, and as such it's load on the system is more. Even so, it was not the best idea to have a construction in the tight loop of the aggregator. All is fixed and I'm looking at the last two views that might need changing, but I'm not sure that even they do as they aren't doing the same kind of work that the aggregator was doing. But I'll give them a look and see if I can speed them up as well.

New NetNewsWire Update

Tuesday, June 5th, 2007

MarsEditIcon128.jpg

It's been a while since I started using NetNewsWire, and primarily as an efficient way to look at a handful of sites like Slashdot and Freshmeat, that I had been following for a while. I just got tired of all the adds I had to watch, and while I know where the money comes from, it didn't make it any quicker to go to all the web sites, read what I wanted and then move on. Also, I had to make a mental note of the last article I read on each site, and that made it most convenient to have the sites up all the time. This was a mess. The for some reason I read an article about NetNewsWire and decided to give it a try.

I started with the 'lite' version and after about a month decided that this was the kind of software that I should be supporting with money. It did exactly what it said it did, smoothly, cleanly, efficiently, and didn't try to be the end-all-be-all application. It did one thing, and it did it extremely well. I was sold.

Ever since then, and this may be back in the 1.x days, I can't remember exactly, I have followed the development of NetNewsWire and really enjoyed it's progress. Now it's in version 3.0, and it's as good as it ever was. One of the things I like most about it is the fact that even though many of it's users like to have a nice, big window up for all the lists, etc. I prefer a nice, little window with just the specifics. And NetNewsWire looks and works as well in the small window format as the large window format. Very nice app.

Coding Standards and Minimal Design

Monday, June 4th, 2007

Today I saw a chunk of code that was checked in and had a very hard time understanding exactly what it was doing. The comments didn't really help, and the variable names weren't a lot of help either. I actually had to walk through the code a line at a time to understand what it was doing. Now I'm no paragon of design and coding virtue, but there's a point where you really need to hold yourself - and the folks you work with, to some bare minimum standards in this area. A 20-line method in Java should not take someone to walk through the code to understand what it's doing. I read Kernighan's quote: Debugging is twice as hard as writing the program, so if you write the program as cleverly as you can, by definition, you won't be clever enough to debug it. and agree with it 100%. The extension of that might be: If you have trouble understanding what you just wrote, the other guy, six months from now, will have to re-write it.

So I figured out the code and started cleaning it up. It didn't need the functionality changed, that was fine. It was the variable names and the flow, and more than anything, the lack of comments saying what was going on and why. I know these are things that aren't popular with a lot of developers, but they make the job of maintenance and extension a lot easier. It also won't hurt to spend just 15 or 20 minutes looking at what you need and trying to see if there's a better way than your first cut. Don't spend the entire day, but spend a little bit of time to make sure that you're putting down something that the next guy will be able to pick up easily.

Java Zealots

Monday, June 4th, 2007

I'm as excitable as the next person. I get whipped up about a lot of things. Development languages are not among them, and I have to wonder at professionals that call themselves developers that do. I was talking to someone the other day and they wanted to add a Java 5-ism to the existing library that was 1.4.2. I said that there are lots of projects that use this, and many are 1.4.2 still and may not change for a long time - if at all. If they work, there's no reason to update them. They might be updated at some point in the future, but this one feature that was being discussed was certainly not a business-justifiable reason.

But that wasn't the end of it. I had to ask this person "Why?" The answer was exactly what I expected: they wanted to type in three lines instead of the existing way of using indexes which might take six lines. So we're talking about saving three lines each time we run through a certain object's elements, and this is the reason for updating working projects? I think not, and I'm be stunned if the users thought so, either. I have met many Java Evangelists, and it's not just that they know, and promote, the newest features of the language, they condemn those that might say they follow the faith, but don't push as hard as they personally do. For instance, if you're developing in Java, you almost have to consistently push to the latest version or you're "outdated", and therefore "don't get it". Still using Enumerators? You just don't get it. Like we're all still coding in COBOL, for instance. I mean it's silly.

If you ask someone coding in C++ they aren't going to ask you the version of the compiler you're using. They may ask you about the features you're using, but not the compiler tricks. Yet that's just what these Java zealots are doing. If you're not on the latest and greatest - and using the latest and greatest, then you might as well be coding in Visual Basic.

What's funny is that many of these people probably haven't spent a lot of time with different languages. They might have learned Java in school - if they took a class in programming there, and they think that's all there is. But I know differently. I watched the industry move from big iron to PCs. And then with PCs to networked apps. Then to Windows. Then to Web. I know Java is not the final language - it's the current language for a lot of people, but it's not the end-all-be-all, any more than COBOL or FORTRAN were. Don't get me wrong, it's a nice language, and it's got some nice features, but not everything is a nail, so you need more than a single hammer.

Yet you'll never be able to tell these people that they are missing the point. That a language is just that - a language. It's a tool for expressing what you want a machine to do for you. Maybe the most expressive way is with Java - maybe not. Maybe the difference in the expressiveness of Java 5 to 1.4.2 is big enough to warrant a move. Maybe not. The fact is there are no universal truths in development. There's a lot of professional development still being done in FORTRAN. That doesn't make it any less useful or usable. It's a set of tools. But I got tired of tilting at windmills and walked away.

Speeding up User Tools

Friday, June 1st, 2007

I spent time today speeding up one of the tools I deliver that is a support web system for one of my major applications. It's a perl CGI app, and there were a lot of things that it was doing that were inefficient. For a single page I was grepping the same file four times for four different sets of data. When I realized that I could grep for any one of the four conditions and then separate it out in the perl to the four cases, it made it use more memory, but was much faster. Typical trade-off: memory for speed, and I wanted to move it as much to the "memory" as possible to get the speed up.

Then I thought that it was possible that there were sections of the web page for certain instruments that some folks don't really look at. If I made those optional, with a checkbox to maintain the state of whether or not to do them, then I could get even more speed out of the system. Nice.

It's nothing that's earth-shaking... but it's nice to get the support guys the best tools I can as they make the app look good.