Archive for the ‘Coding’ Category

Optimizing jTDS packetSize for MS SQL Server

Thursday, January 14th, 2010

While doing some network testing/optimization recently, one of the network guys suggested I look at the jTDS parameter packetSize. He thought it might be something to look at if all else failed.

Since I had pretty much gotten to that point, I decided this morning to do those tests, and at the same time take a look at what H2 might say about performance tuning - as that was the destination of the data, after all.

The first step was to change the datatype in the database. According to the H2 docs:

Each data type has different storage and performance characteristics:

  • The DECIMAL/NUMERIC type is slower and requires more storage than the REAL and DOUBLE types.
  • Text types are slower to read, write, and compare than numeric types and generally require more storage.
  • See Large Objects for information on BINARY vs. BLOB and VARCHAR vs. CLOB performance.
  • Parsing and formatting takes longer for the TIME, DATE, and TIMESTAMP types than the numeric types.
  • SMALLINT/TINYINT/BOOLEAN are not significantly smaller or faster to work with than INTEGER in most modes.

The DBA I'd worked with to set up the back-end database that I read from didn't like using the double datatype primarily due to rounding. I said it was OK, but relented when he pressed. I then used the same DECIMAL(19,6) in the H2 in-memory database as existed in the MS SQL Server database. Seems reasonable, but it flies in the face of the suggestion from the H2 docs.

Since it's all Java, and a Java Double is OK with me, I decided to change all the DECIMAL(19,6) columns in the in-memory database to double. The results were amazing. I was able to achieve more than a 50% increase in the rows/sec processed by this simple change. Additionally, I was able to see a significant reduction in the memory used for the web app after making this change.

All told, a wonderful suggestion.

Then I took to running tests with different values of packetSize. I got:

packetSize Portfolio Product
512 11,496 11,904
1024 11,636 13,650
2048 11,902 13,941
4096 11,571 12,703
8192 12,560 14,774
16384 12,447 14,744
32768 12,753 14,017
65536 12,680 15,038

where the data is (rows/sec) processed from the back-end database into the in-memory database. Faster is clearly better.

What I found was that a size of 8192 was the smallest value that got good performance. So that's what I went with. With these changes, my 7 minute restart is down to about 2:20 - an impressive improvement.

I’m Horrible at Accepting Other’s Sloppy Work

Tuesday, January 12th, 2010

cubeLifeView.gif

OK... here's something that I know I'm horrible at: Accepting other's sloppy work.

Yup, I know I'm no good at it.

When I've spent several months creating a new system based on solid design goals and refactoring where and when it's necessary, it really bugs me when someone with far less attention to detail starts slapping through the codebase like a drunk with a steel-bladed weed wacker.

I do my very best to make sure that when I go into someone else's code I stick to their coding conventions - whether I like them or not. This goes past the coding style to the design, flow, control, and even architecture of the app. It's not easy, but it's respectful of the work the person did to get the codebase to this point.

It's certainly possible that all this effort is misplaced. It could be that the codebase is a pile of junk, and the original author didn't put this much effort into it's original creation - but that's not the point. The point is that I am a visitor in this codebase, and as such, I should at least ask before I start moving functionality around, and when I do, I better make it look like the original author did it.

Well... as you can imagine, I've been the recipient of some help in one of my projects, and the author didn't talk to me about it (I would have done it entirely differently), and executed it with the same grace and skill as a drugged-out bull knitting a sweater.

I've done my best to not tell them to never do this again, but my displeasure is clearly evident. When I have to learn about major shifts in the code from svn update, and then to see the changes implemented so horribly, well... it's a hot button for me.

So take a few tips from me. If you're a visitor in someone else's code try to follow these simple rules:

  • Talk about the changes you're planning before you do them. Seems simple, but you'd be surprised how many people sit less than ten feet from one another and this doesn't happen. Take the time. It may not be a big deal, but it's going to make what you do seem a lot more like help, and a lot less like a savage beating.
  • When you write new code, stick to the established coding style. Again, seems simple enough, but for those that use IDEs, the vertical spacing that's essential for someone that doesn't use the same IDE can be really screwed up. Take the few seconds to make sure that the file you're about to check in looks like the original author wrote it.
  • If you move functionality, discuss alternatives first. This is an extension of the first point, but the importance is even greater. It may not be apparent by looking at the code what the author's choice might have been for the code you're moving, so it's really important that you discuss what they would have done, if they were doing it. For all you know, there's a reason or pattern you're missing and it should stay right where it is.
  • Rise to the level of their code. This might be impossible for some, but to as great an extent as possible, try to be that coder. You may not have the skills, but make the attempt. Look at the existing code... study it... learn from it. If this coder is better than you are, you'll know it. It'll show in their code and yours. Don't do a sloppy job. Rise to the occasion. Be better than you have to be. Be Excellent. In the end, you'll be a better coder.

Sorting in Hierarchal Google Table Visualization

Monday, January 11th, 2010

GoogleVisualization.jpg

Step two in my creation of a modified version of the Google Visualization Table widget is to add in the indentation, sorting and collapsing of the data so that on each recalculation of the data table it looks like it's supposed to look to the user. Some of these are a lot easier than others, but they all share a common theme that was started with the creation of the aggregate groups in the previous post - that is, there's a list (array) of groups and that dictates how to do each of these steps.

Adding Indentation

The first thing I wanted to tackle is the indentation required when all the groups are expanded. Since I'd calculated the aggregate rows, and sorted them, what I needed to do was to make it look nice when all these groups are expanded. It's going to be a simple manner of putting non-breaking spaces in front of the label on the row, but the question is how to do this with the existing structure I have for the groups? Answer: pretty simply.

In keeping with the portfolio-to-row index mapping, and a simple loop over all the groups in the array, we can use the recursive function:

  function indentGroupMembers(tbl, grp, map) {
    for (var e = 0; e < grp.members.length; ++e) {
      if (map[grp.members[e]] != undefined) {
        var row = map[grp.members[e]];
        // ...add in the necessary space
        tbl.setValue(row, 0, '&nbsp;&nbsp;&nbsp;' + tbl.getValue(row, 0));
        // see if this member is, in fact, a group itself
        for (var g = 0; g < groups.members.length; ++g) {
          if (groups[g].name == grp.members[e]) {
            indentGroupMembers(tbl, groups[g], map);
            break;
          }
        }
      }
    }
  }

The beauty of this is that we have already defined all the structure we need to do the complete indenting - no matter how deep it goes. All we needed to do is to identify if the member is a group, and if so, call it again. Works perfectly.

Adding Group-Level Sorting

Probably the most difficult part of this was getting the sorting done correctly. Each group had to be sorted properly - with the other groups at the same level. Then within each group, the members had to be sorted - but stay within the limits of the group. All this while still appearing to work with the Google Table widget. I was concerned.

The idea I settled on was a hybrid of the sorter and the indenter - what if I went through the groups - sorted their members and assigned numbers 1, 2, 3, ... for their position in the group. Then, we'd scale up the values based on their "depth" in the scheme.

OK, an example. If we had in the table the following data:

Portfolio Delta
Tech 101.00
   AAPL 41.00
   MSFT -10.00
   GOOG 70.00
Retail 0.00
   HD 55.00
   LOW -55.00

then we'd add in the 'sorting column', and then scan each group - picking out the values, placing them in a JavaScript array, sorting that, and then assigning values. The trick is to once again use JavaScript objects and take not only the value of the row, but the row index so it's easy to place them in the right order.

Portfolio Delta Sort
Tech 101.00 2
   AAPL 41.00 2
   MSFT -10.00 1
   GOOG 70.00 3
Retail 0.00 1
   HD 55.00 2
   LOW -55.00 1

So that now you can see that each group has it's order assigned. The 'indenting' trick is to then scale up the non-leaf nodes by a factor of 10 for each level and add it to the members so that the members are sorted with the groups. This only works if you have less than 10 members in a group. If you have more, then simply increase the factor so that the largest group's membership is covered.

When you're done applying the scale factor, you'll have something like this:

Portfolio Delta Sort
Tech 101.00 20
   AAPL 41.00 22
   MSFT -10.00 21
   GOOG 70.00 23
Retail 0.00 10
   HD 55.00 12
   LOW -55.00 11

At this point, it's pretty clear that the simple table sort on this column will get us what we're looking for. Pretty neat. It took me a while to figure out what I needed to do here, and the code for the factor application and sorting isn't trivial, but it's not hard, and it's all driven by the group definitions so it's very flexible.

Adding the Collapsing

Once the aggregation and sorting are done, the final step is the collapsing of the non-expanded groups. It's easy to see that we can have a JavaScript array with the names of the expanded groups in it - removed when it's collapsed, added when it's expanded, so that it really comes down to removing the members of the groups that aren't expanded.

Based on the same ideas as the indenting this function does the trick:

  function removeGroupMembers(tbl, grp) {
    for (var e = 0; e < grp.members.length; ++e) {
      // remove this guy (row) from the table
      removeRow(grp.members[e], tbl);
      // see if this member is, in fact, a group itself
      for (var g = 0; g < groups.members.length; ++g) {
        if (groups[g].name == grp.members[e]) {
          removeGroupMembers(tbl, groups[g]);
          break;
        }
      }
    }
  }

the difference here being that because the table is dynamic at this stage, we can't use the portfolio-to-row index map, and have to, instead, have the function:

  function removeRow(name, tbl) {
    var rowCnt = tbl.getNumberOfRows();
    for (var i = 0; i < rowCnt; ++i) {
      if (name == tbl.getValue(i, 0).replace(/&nbsp;/g,'')) {
        tbl.removeRow(i);
        break;
      }
    }
  }

where this function looks in the table for the portfolio name and then removes it from the table. This is going to be an expensive step, but it's the only way I know to do what's needed without doing more than what's needed. After all, the groups may be expanded, and for those, there's nothing to delete.

Put together with a little JavaScript and tags, clicking on the group in the table toggles it's inclusion in the expanded list, updates the data and re-generates the table. It's pretty slick. Pretty fast, and exactly what I needed.

Trouble with Java Default Values

Monday, January 11th, 2010

java-logo-thumb.png

I got a note from the London users of my web app - specfically the email alerts system I've built into the web app so that people don't have to watch any one display all day long - they can get a chat or email alert telling them a certain condition has occurred. In any case, this morning I got word that the specific alert wasn't working quite right. So I started digging into the code. And hit one of my biggest annoyances with Java... the timing of the setting of the default values for class instance variables.

Let's look at a class and it's subclass. The problem will be in the subclass, but it's not obvious at looking at the code:

  public class BaseAlert extends Object {
    /**
     * These are the constructors - make the default protected so it's
     * not called accidentally, and make the general form of the
     * constructor take the important params and set things up right.
     */
    protected BaseAlert() {
      // always do the super's constructor
      super();
      // now do the base initialzation
    }
 
    public BaseAlert(String aName, Properties aProp) {
      // do the base constructor
      this();
      // now do the initialization of my args
      setName(aName);
      // update the configuration from the Properties
      updateConfiguration(aProp);
    }
 
    /**
     * This method picks out all the values from the Properties to
     * configure this instance.
     */
    public synchronized void updateConfiguration(Properties aProp) {
      /**
       * Read the values I need from the map and use them. Simple.
       */
    }
  }

and now the subclass that uses the updateConfiguration() method call for the same purpose, but has independent instance variables:

  public class MyAlert extends BaseAlert {
    /**
     * This is meant to be an attribute of the specific class. The
     * default value should work, but it doesn't.
     */
    private String      _code = null;
 
    /**
     * These are the constructors - make the default protected so it's
     * not called accidentally, and make the general form of the
     * constructor take the important params and set things up right.
     */
    protected MyAlert() {
      // always do the super's constructor
      super();
    }
 
    public MyAlert(String aName, Properties aProp) {
      // always do the super's constructor
      super(aName, aProp);
    }
 
    /**
     * This method picks out all the values from the Properties to
     * configure this instance.
     */
    public synchronized void updateConfiguration(Properties aProp) {
      // do all the super's value picking first
      super.updateConfiguration(aProp);
 
      /**
       * Read the values I need from the map and use them. Simple.
       */
      setCode(aProp.getProperty("Code"));
    }
  }

Here's what should happen:

  • Creating an instance of MyAlert causes the subclass to be created.
  • The call to updateConfiguration() is done and the values are read in.

Here's the catch: the default values are applied after the constructor is called. No joke.

The upshot: the value of _code is always going to be null all the debugging isn't going to help, as I found out. It's in the way the JDK handles the creation of the classes. Very annoying.

Fix? Have no default instance variable values. That's about it.

Grrr...

Cleaning Up Data for Public Relations

Friday, January 8th, 2010

Every app I've worked on has data problems. Period. In my recent web app, I have the ability to edit the in-memory database contents that forms the basis of the displays the user sees. I typically clean-up the data in the dev, test and production apps a few times a day, but there was concern that someone might hit test or prod and see bad data and have a bad impression of the system. So for public relations reasons, I needed to put in some way for the Test and Prod versions to automatically clean themselves.

I looked at cleaning the data. It's possible. I can add a servlet context parameter that contains the name of the machine so I can have different instances behaving differently, but I was still concerned about the state of the data. It might be "bad", but is that a reason to delete it from the back-end persistent store? I'd like to think not.

I worried about only cleaning up the in-memory database, but that had a lot of problems because the same INSERT statements that are used for the back-end database are used for the in-memory database. So what to do...

Then it hit me - a far simpler solution: Use the value in the servlet context as the limit in the SQL statement on the selected machines!

Silly me. The views would always look "clean", but the underlying data is still in the databases in it's raw state. Nice.

Sure, it's silly, but when you spend 20 mins trying to come up with a clever way to auto-clean the data - always thinking of it in the "clean" sense, you get stuck in a rut and something as simple as re-casting the idea to a filter seems amazing.

No one said I was a genius.

Starting to Add Hierarchy to Google Table Visualization

Thursday, January 7th, 2010

GoogleVisualization.jpg

The primary user of a new web page I created came back to me with a nightmare request: take the table and make it a tiered drill-down table. The table is the Google Visualization Table, and it's not got a single provision for handling roll-ups of the data, or hiding/expanding groups - or even of groups at all. This was going to have to be something I implemented from scratch.

Ick.

I had all the data, and I could get the organization, but how to store the organization? How to expand/collapse it? There were a lot of things I needed.

The thing I wanted to get started today was the aggregation into groups. It's a pretty simple idea: there's a new row in the table that is a very simple attribution of the data on the rows of it's members. So I needed to be able to identify the groups in some order so that they would be calculated in a consistent manner. If there were going to be several levels in this new system, we needed to be able to correctly aggregate the data for all groups.

What hit me was a simple use of the JavaScript objects and arrays to organize the groups. Each group would have a name and list of members (for now), and they would be placed into an array in the order of processing, which would guarantee that the lowest-level groups were calculated first, and then those depending on these next, and so on.

Breakthrough.

The group definitions started out looking like:

  var g = 0;
  var groups = new Array();
  groups[g++] = { name: 'Housing',
                  members: ['ABC', 'DEF', 'GHI'] };
  groups[g++] = { name: 'Tech',
                  members: ['AAPL', 'IBM', 'GOOG'] };
  groups[g++] = { name: 'Retail',
                  members: ['BBUY', 'HD', 'LOW'] };

which means that if you want to re-order the calculation, just move the definitions in the JavaScript file - the g++ takes care of putting them in the right order. Also, with this we can then look at writing something like this to add the aggregated value rows to the end of the table:

  // create all the groups - in the right order
  for (var g = 0; g < groups.length; ++g) {
    createGroup(answer, groups, portMap);
  }
 
  /**
   * This function creates the aggregate row in the table
   */
  function createNewGroup(tbl, grp, map) {
    // create a new row at the end of the table - sort later
    var row = tbl.addRow();
    // add it to the map
    map[grp.name] = row;
    // ...set the name of the group in the right spot
    tbl.setValue(row, 0, grp.name);
    // now get the values for the group from the members present
    var val = 0.0;
    var colCnt = tbl.getNumberOfColumns();
    for (var c = 1; c < colCnt; ++c) {
      // reset the value for each column
      val = 0.0;
      // aggregate the numerics values - pick first non-numeric
      if (tbl.getColumnType(c) == 'number') {
        // sum up all the available numeric values
        for (var e = 0; e < grp.members.length; ++e) {
          if (map[grp.members[e]] != undefined) {
            val += tbl.getValue(map[grp.members[e]], c);
          }
        }
      } else {
        // get just the first available value
        for (var e = 0; e < grp.members.length; ++e) {
          if (map[grp.members[e]] != undefined) {
            val = tbl.getValue(map[grp.members[e]], c);
            break;
          }
        }
      }
      // save the value we have for the column
      tbl.setValue(row, c, val);
      // ...and set the formatted value as well
      if (tbl.getColumnType(c) == 'number') {
        if (val != null) {
          tbl.setFormattedValue(row, c, val.numberFormat('#,##0;(#,##0)');
        }
      }
    }
  }

The trick here is the mapping of the portfolio (or member) to row in the table. I created a function that did this for me:

  function mapNamesToIndex(tbl) {
    var map = new Array();
    var port = null;
    for (var i = 0; i < tbl.getNumberOfRows(); ++i) {
      // get the portfolio for this row
      port = tbl.getValue(i, 0);
      // map it into the array
      map[port] = i;
    }
    return map;
  }

With this, I'm able to make one scan through the table and map the row indexes for all values - making it much faster to do the aggregations.

With this, I'm able to get the aggregations created in the table. With the previous posting about how to sort on an arbitrary key, I can then pass this into that function to have the groups above their contents in the table.

Big day... not done, but a good start to what I need to get done.

Setting the Read Timeout on a URL Request in Java

Wednesday, January 6th, 2010

I got a request from a co-worker late yesterday to add in a timeout to the AJAX gathering code in my web app. It wasn't immediately obvious, but I spent just a few minutes and it turned out to be pretty simple after all. It's used when you need to pull in an XML file from a URL for parsing into a DOM. Not hard, but very important to get right, and in a nicely flexible way.

If you start with the code I had originally:

  InputSource    retval = null;
  URL   source = ...;
 
  try {
    retval = new InputSource(new InputStreamReader(source.openStream()));
  } catch (IOException ioe) {
    if (!ioe.getMessage().equals("Connection refused") &&
        !ioe.getMessage().startsWith("Server returned HTTP response code: 500 ")) {
      log.error("While trying to get the data from the URL, an IOException occurred: "
          + ioe.getMessage();
    }
  }

and then finally read in the JavaDocs that the call:

  source.openStream()

is really just:

  source.openConnection().getInputStream()

So the code can quickly become:

  InputSource    retval = null;
  URL   source = ...;
 
  try {
    URLConnection  conn = source.openConnection();
    if (conn != null) {
      if (aTimeoutMSec > 0) {
        conn.setReadTimeout(aTimeoutMSec);
      }
      retval = new InputSource(new InputStreamReader(conn.getInputStream()));
    }
  } catch (IOException ioe) {
    if (!ioe.getMessage().equals("Connection refused") &&
        !ioe.getMessage().startsWith("Server returned HTTP response code: 500 ")) {
      log.error("While trying to get the data from the URL, an IOException occurred: "
          + ioe.getMessage();
    }
  }

In the end, we added in more logging - even a stack trace by a co-worker to help them find out what the problem was with a server-side error. But I have to admit, Java isn't as clear as it could be on things like this. But the JavaDocs did have what I needed, and that was enough to find out what I needed.

Neat Little Sorting Idea for Google DataTables

Tuesday, January 5th, 2010

I've been working on a new page in my web app - something that will display the latest data (or any point in time, actually) and the users wanted to have it sorted by a very unusual key - asset class, then region, then desk. It's nothing simple, and I was worrying about how I'd get this done in a clean way. I didn't want to implement another sorting scheme... and I didn't want to mess with the reorganization of the table. But I just wasn't seeing how I would be able to make this happen easily.

Then it came to me: the DataTable from Google (in JavaScript) and it's Java counter-part written by me, has the sort() method, and that could be used to organize the rows in any way I want if I provided it a suitable key. The trick was to give it that key.

If I classified the row labels into their asset class, region, and desk - by name, then I could use this to create the index I needed.

Asset Class Code
Equities A
Rates B
Commodities C

and then:

Region Code
USA A
Europe B

so that I can then (in JavaScript) map the portfolios to these regions and asset classes (same will go for the desks, but it's an unnecessary complication here):

  var assetClass = new Array();
  assetClass['One'] = 'Equities';
  assetClass['Two'] = 'Equities';
  assetClass['Three'] = 'Rates';
  assetClass['OneMore'] = 'Rates';
  assetClass['Huey'] = 'Commodities';
  assetClass['Louie'] = 'Commodities';
 
  var region = new Array();
  region['One'] = 'USA';
  region['Two'] = 'Europe';
  region['Three'] = 'USA';
  region['OneMore'] = 'USA';
  region['Huey'] = 'USA';
  region['Louie'] = 'Europe';

With all this static data established for each portfolio, I can then create the JavaScript function to sort a passed in table on a known column. Let's say for my page, the portfolio name is in the first column. I could look for it but the getColumnLabel() method on the DataTable, but I know it's there, and that saves a little time.

  function sortByAssetClassAndRegion(tbl) {
    // make a new column that we'll use to sort
    var col = tbl.addColumn('string');
    // populate it with the mapping of the names based on the attributes
    for (var row = 0; row < tbl.getNumberOfRows(); ++row) {
      var portfolio = tbl.getValue(row, 0);
      tbl.setValue(row, col, assetClass[port] + '-' + region[port] + '-' + port);
    }
    // now just sort by the new column
    tbl.sort([{ column: col }]);
    // delete it now as it's served it's purpose
    tbl.removeColumn(col);
  }

This is pretty nice in that it leaves the table sorted by this other key, but it's very quickly done. The key's composition can be changed in any number of ways to make this even more flexible. For example, I expanded on this to make a sorting order that was specifically set by the user - and had nothing to do with asset class or region.

First, make an array with the sorting order you want to have:

  var sortingOrder = new Array();
  var i = 0;
  sortingOrder['One'] = i++;
  sortingOrder['Two'] = i++;
  sortingOrder['Three'] = i++;
  sortingOrder['OneMore'] = i++;
  sortingOrder['Huey'] = i++;
  sortingOrder['Louie'] = i++;

then you can create the sorting function:

  function sortByDefinedOrder(tbl, order) {
    // make a new column that we'll use to sort
    var col = tbl.addColumn('number');
    // populate it with the mapping of the names based on the attributes
    for (var row = 0; row < tbl.getNumberOfRows(); ++row) {
      tbl.setValue(row, col, order[tbl.getValue(row, 0)]);
    }
    // now just sort by the new column
    tbl.sort([{ column: col }]);
    // delete it now as it's served it's purpose
    tbl.removeColumn(col);
  }

and then it can be called:

  sortByDefinedOrder(answerData, sortingOrder);

It's not rocket science, but it is a nice little tool to have in hand to make sorting easy.

Google Chrome 4.0.249.49 is Out

Tuesday, January 5th, 2010

This morning I was coming off vacation and noticed that Google Chrome for Mac OS X had been updated to 4.0.249.49. Not a major update, but it's nice to see them making progress. Typically, I'm on the 3.x series of Chrome, but that's on Windows, and thankfully I don't have to use that unless it's work-related.

It's great to see Safari get a little competition.

It’s Tough to Stay Upbeat Working on Horrible Code

Tuesday, December 22nd, 2009

Today I got tossed into having to work with my nemesis project today. This is the code that I can never seem to get rid of, and will most likely follow me as long as I'm here. It's painful. Well, today I thought that I wouldn't have to implement a certain change because another project (written by a co-worker) was going to take over this part of the code, which has been the goal for a while, but I just wasn't aware that now was the time.

So I talked to the guy doing the work on the new project, and realized that he wasn't anywhere close to getting it done. This meant that I needed to do the work if it was going to get done in a timely manner.

So back into the muck and crud I went... I spent about half a day on it, and in the end, it's working, and should work just fine, but it's left me feeling very un-Christmas-like. Very.

It's been a very tough couple of months - November and December. Liza's been sick with migraines, and work has been end-of-the-year stressful, as it can be. I'm not feeling like I'd like to feel, and yet there's really no time to just say "Timeout!" and get into the mood. I've got a few days, and I'll be working on Christmas Eve because I'll be taking time off for a family vacation the week between Christmas and my birthday.

I know there's no way to outrun this codebase. There just isn't.

I also know there's no way to get the other members of my team to get their stuff done faster. They're doing the best they can, it's just not as fast as I can work. I get loaded down at the end of the month because I hit all my objectives, which is another reason it's stressful. I just wish I knew that I'd never have to work on this codebase again.

But I know better.

Sigh.