Archive for December, 2012

Tired of Waiting for People – Finishing Teradata Pull

Wednesday, December 5th, 2012

Building Great Code

After waiting for a few other folks in another group, I just decided that there was no reason to wait any longer. A co-worker in Palo Alto has been waiting on some data for weeks now, and there's no reason for it. I had the ruby code to pull data from Teradata and put it into JSON structures for use in the main code base. I had some time today, and just decided that there wasn't a good reason to wait any longer.

I got the code out of storage and refreshed the SQL query with my co-worker and then started summarizing the data as per his requests. Thankfully, it was all pretty straightforward - I needed to collect all deals for a merchant, and take the median of a few values and count up the occurrences of a few others. Nothing horrible, and a few helper methods made pretty quick work of it.

After I got it all generated, it was time to work the data into the Merchant model in the existing code. The final destination for this data is to update the sales value calculation by updating the Merchant's quality score based on previous deals. I needed to put it in the ETL for the raw merchant data and just merge in the new data with the existing data and then it's ready to be used in the calculator.

Not bad. And it didn't take more than an hour or two. No need to wait for the other group any longer. Now they can write their code and then we can make a simple REST client to it and fold in the data in the same way. Easy to update and simple to retrofit. Nice.

Default Encodings Trashing Cronjobs

Wednesday, December 5th, 2012

bug.gif

This morning, once again, I had about 500+ error messages from the production run last night. It all pointed to the JSON decoding - again, but this time I was ready: the fail-fast nature of the script now didn't try to do anything else, and I could retry them this morning. So I did.

Interestingly, just as with the tests yesterday, when I run it from the login, it all works just fine. So I fired off the complete nightly run and then set about trying to see what about the crontab setup on these new boxes was messed up and didn't allow the code to run properly. Thankfully, based on yesterday's runs, I know I could get them all done before the start of the day.

So when I started digging, I noticed this in the logs:

  Input length = 1 (Encoding::UndefinedConversionError)
    org.jruby.RubyString:7508:in 'encode'
    json/ext/Parser.java:175:in 'initialize'
    json/ext/Parser.java:151:in 'new'
    ...

so I did a little googling and it brought me back to encodings - what I expected. Which reminded me of this issue I had with reading the seasonality data in the first place. Then I looked at our code, and we are using a standard reader method to get data for both CSV and JSON:

  def self.read_file(filename)
    contents = ''
    what = project_root + '/' + filename
    File.open(what) do |file|
      contents = file.read
    end
    contents
  end

which is all very standard stuff.

What the hits on google were saying was that I needed to think about the encodings, and so I changed the code to read in iso-8859-1 and then transcode it to utf-8:

  def self.read_file(filename)
    contents = ''
    what = project_root + '/' + filename
    File.open(what, 'r:iso-8859-1') do |file|
      contents = file.read
    end
    contents.encode('utf-8', 'iso-8859-1')
  end

Then I saw in another post about encodings in ruby, that I could collapse this into one step:

  def self.read_file(filename)
    contents = ''
    what = project_root + '/' + filename
    File.open(what, 'r:iso-8859-1:utf-8') do |file|
      contents = file.read
    end
    contents
  end

which simplifies the code as well as the understanding: The file is iso-8859-1, but I want utf-8. Perfect! I put this in and I should be good to go.

But the question is really then: Why does the login shell work? After all, if they both failed, that would make sense. But they both don't. That got me looking in the direction of what's defined in the login shell that's not in the crontab pseudo-shell. As soon as I scanned the output, it was clear:

  LANG=en_US.UTF-8

and that explained everything.

The crontab 'shell' doesn't define this, and you can't put it in the crontab file like you can the SHELL and MAILTO variables. So the solution was simple: put it in my main script right after the PATH specification:

  export LANG="en_US.UTF-8"

and all the problems should just go away! That would be nice. I'll have to check when the runs are finished this morning.

Updating Metrics App for Couch Changes

Tuesday, December 4th, 2012

WebDevel.jpg

Most of my day was spent struggling with the 'metrics' app - a simple web app that we use to present the metrics for all the runs we do. Now that we're running all of North America, the next most important issues to solve are adding a few columns to some CSV exports from this web app. But as I soon found out, this was far more involved than adding a column or two.

The reason they needed to be added was just additional information for the users investigating the data to spot problems. But what I soon found was that the changes we had made to how we wrote data to Couch - as four separate documents as opposed to one document and three (server-side) updates to that document, had a far greater impact than we knew. Most clearly evident in that a lot of the reports simply didn't work.

So I needed to go back and check every function on the page. Thankfully, most of the ties were to the javascript or backing ruby service code, but it was still a lot of work as there wasn't a ton of documentation on it, and I had to bop back and forth to the Couch web viewer to see what I had available to me to build with.

But the real kicker was when we needed to relate one document, the output of one process doesn't have any way to relate it's output to that of another. The best we've got is the loose relationship of time: one process starts pretty soon after the other.

So I had to add quite a few views, and complicate the logic in order to get what we needed from what we were given, and the timing relationship between the phases. It's not ideal, but it seems to work, and for all the crud I had to go through, it should work.

I'm glad it's over.

Lots of Little Tasks Add Up to Lots of Progress

Monday, December 3rd, 2012

Building Great Code

Today I've spent a lot of time doing a lot of little things that have really added up to some really significant changes for the application. We're already running all of North America, except the account reassignment, so that's a major goal already reached, but there are still a lot of little things that need to be done to get us to the next level.

From this morning's runs, it was clear I needed to put in a little time making the code a lot more robust to bad data. We were getting some nil class exceptions, and that's just being careless with the code. You have to make sure something it's nil before you assume it's not nil.

I also fixed the encoding on the CSV by:

  CSV.foreach(manual, :headers => true, :encoding => 'iso-8859-1') do |rec|
    # ...process the record
  end

in a very similar manner, we got a new file from the users for the seasonality data, and this guy had plenty of non-UTF-8 characters and rather than edit them out, I choose to use the different encoding to properly handle them.

Finally, I updated the logging on the reassignment phase so that we could really see what's happening on the unassignment and assignment phases - including a very easily extractable 'undo' text for those times that we may need to undo the changes we've made. This has been a problem for a while, and it really just needed to get punched out.

I had a few more, but they were even less exciting than these. All told, however, I cleared a lot of issues in the system, and that's what really counts.

Fixed for Canadian Postal Codes – Again

Monday, December 3rd, 2012

bug.gif

Once again, I had a report of a bug in the system and I started tracking it down. This particular bug was reporting that the number of closed deals to adjust the demand for was being seriously under-reported. Like major under-reporting. So I started looking at the fetching code, and then how the closed deals were being matched up against the demand, and it literally popped off the screen at me.

Canadian postal codes.

I've seen this before.

Thankfully, I knew just what to do. The problem was that in Canada, the postal codes are six characters with a middle space, and only the first three are significant to the spatial location data we use. That means we needed to look at the country and then correctly deal with the postal code.

The code I came up with was very similar to what I'd used in the past:

  all_zips = recent_close['locations'].map do |loc|
    loc['country'] == "CA" ? loc['zip_code][0,2] : loc['zip_code']
  end

and then we can use them just like we do with the Merchant to Demand pinning. Makes perfect sense why were weren't seeing a lot of matches with the previous code - the postal codes were far too specific.

That was a nice one to get out.

Hanging out at Caribou!

Saturday, December 1st, 2012

So this is the first time I've had to hang out at Caribou! while Liza is working. It's a zoo here, with a book signing and the place is packed! Still… I'm trying the Hot Apple Cider, and it's really quite tasty.

Hanging out at Caribou!

I was hoping it'd be a nice quiet place to get a little reading or coding done, but that's just not in the cards. Not unless I go deaf in the next 60 sec, or they close this book signing by a mass exodus.

Oh well… it's still great to see Liza work. It puts a smile on my face.

Like the picture.