Goin’ Old School on Some Code

Professor.jpg

Today I was trying to figure out why an app I inherited was returning wild values and I came across code that I'd skimmed several times before, but hadn't really dug into. Since the running program wasn't generating any exceptions, I assumed that the problem had to be in the data we were receiving. I spent several hours digging into the data, logging it, checking for problems, all to come to the conclusion that it couldn't be the data, and had to be something in how it was being interpreted in the code.

So back to the processing code that was the first step in the program.

My goals were simple - put in logging and try/catch blocks to make sure that we caught the exceptions and group the processing into distinct groups: gathering data, converting it into values, and then organizing the data into objects. Seems simple. And then I looked, really looked at the code.

What I saw made me shake my head... and bring out the "Old School" in me. The original code looked something like this:

  public uploadData(StringTokenizer mainTokenizer) {
    // get the number of rows as a rough size for the maps
    int size = mainTokenizer.countTokens();
	// ...make some maps to store data
 
    while (mainTokenizer.hasMoreTokens()) {
      String row = mainTokenizer.nextToken();
      StringTokenizer columns = new StringTokenizer(row, "\t");
      while (columns.hasMoreTokens()) {
        String nameString = columns.nextToken().trim();
        String dateString = columns.nextToken().trim();
        String costString = columns.nextToken().trim();
        String qtyString = columns.nextToken().trim();
        // ...more of the same
 
        String name = nameString;
        Date transDate = dateFormat.parse(dateString);
        Double cost = Double.valueOf(costString);
        Double qty = Double.valueOf(qtyString);
        // ...more of the same
 
        Order ord = orderFactory.createOrder(name, transDate, cost, qty);
        orderMap.put(name, ord);
        // ...more of the same
      }
    }
  }

The goal was admirable - divide the processing into three sections: acquisition, conversion, and organization. It's an admirable goal. But doing it this way is not only a little redundant, it's downright dangerous.

Look at the most obvious things: reused variables. Get rid of them. Make it cleaner by using the variables where they are needed. No more, no less. it's an admirable goal, and a good design plan, but when you're processing 30 or more variables in a table of data, it gets to be a little much when you have to repeat these three times. Trim it a little and keep it readable.

But he subtle things really are the danger signs. Why is he using the StringTokenizer for the rows in the table and not checking the number of elements? There's the possibility for a major confusion here. What if you have a row of data that's malformed? There's nothing that's going to tell me what the problem was and where to look in the data stream to correct it. Also, if there's an exception thrown all the remaining data is lost. Not a good thing.

Let's try something a little different.

  public uploadData(StringTokenizer mainTokenizer) {
    // get the number of rows as a rough size for the maps
    int size = mainTokenizer.countTokens();
	// ...make some maps to store data
 
     // processing variables used in the loop
    String name = null;
    Date transDate = null;
    Double cost = null;
    Double qty = null;
 
    // keep track of what I've done for logging at the end
    int    uploadCount = 0;
    int    uploadErrors = 0;
    while (mainTokenizer.hasMoreTokens()) {
      String row = mainTokenizer.nextToken();
      String[] columns = row.split("\t");
      if ((columns == null) || (columns.length != SIZE)) {
        log.error("No columns or wrong number.");
        ++uploadCount;
        ++uploadErrors;
        continue;
      }
 
      String nameString = columns[0].trim();
      String dateString = columns[1].trim();
      String costString = columns[2].trim();
      String qtyString = columns[3].trim();
      // ...more of the same
 
      try {
        name = nameString;
        transDate = dateFormat.parse(dateString);
        cost = Double.valueOf(costString);
        qty = Double.valueOf(qtyString);
        // ...more of the same
      } catch (NumberFormatException nfe) {
        log.error("While parsing " + name + " got a "
            + nfe.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      } catch (ParseException pe) {
        log.error("While parsing " + name + " got a "
            + nfe.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      }
 
      try {
        Order ord = orderFactory.createOrder(name, transDate, cost, qty);
        orderMap.put(name, ord);
        // ...more of the same
      } catch (Exception e) {
        log.error("While organizing " + name + " got a "
            + e.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      }
 
      // update the count of rows processed
      ++uploadCount;
    }
 
    // log what we've done
    log.info("Processed " + uploadCount + " with "
        + uploadErrors + " errors");
  }

Now the logging messages are far too simplistic here, but they're where you need to log what's happening. First, forget that StringTokenizer and it's loop. What if you're wrong about the count? Use a simple String[] and then check the count. To maintain the style, you can leave the intermediate values, but another scheme is to have static final int values for each column header and then use them as opposed to the temporary variables:

      String nameString = columns[0].trim();
      String dateString = columns[1].trim();
      String costString = columns[2].trim();
      String qtyString = columns[3].trim();
      // ...more of the same
 
      try {
        name = nameString;
        transDate = dateFormat.parse(dateString);
        cost = Double.valueOf(costString);
        qty = Double.valueOf(qtyString);
        // ...more of the same
      } catch (NumberFormatException nfe) {
        log.error("While parsing " + name + " got a "
            + nfe.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      } catch (ParseException pe) {
        log.error("While parsing " + name + " got a "
            + nfe.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      }

becomes:

      try {
        name = columns[NAME_COL].trim();
        transDate = dateFormat.parse(columns[DATE_COL].trim());
        cost = Double.valueOf(columns[COST_COL].trim());
        qty = Double.valueOf(columns[QTY_COL].trim());
        // ...more of the same
      } catch (NumberFormatException nfe) {
        log.error("While parsing " + name + " got a "
            + nfe.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      } catch (ParseException pe) {
        log.error("While parsing " + name + " got a "
            + nfe.getMessage());
        ++uploadCount;
        ++uploadErrors;
        continue;
      }

and what's left is as readable as the original, but uses half the variables. This is important if we're in a tight loop and getting hit with a ton of data. Garbage Collection is a killer for Java apps, and the fewer variables you can use the better.

When I put this code into the system I had an immediate hit: the data coming from the other app was leaving a column field empty when it meant to send a zero. I saw this with the NumberFormatException and it clearly labeled the row in the table where the error occurred. I looked in the multi-thousand line upload and sure enough - a blank. Given that it was meant to be a zero, I just put that into the code on the uploader so I wouldn't have to worry about other such occurrences. Easy.

So what have we learned?

Well... I've learned that it's better to have less code that does the same thing so long as it's readable and well commented. Both of these attributes were missing in the original version. Secondly, check everything - there's no reason not to, it'll save your bacon more times than you can imagine. Third, care about the code you're writing. It's a craft, after all. Be proud of it.