Default Encodings Trashing Cronjobs

bug.gif

This morning, once again, I had about 500+ error messages from the production run last night. It all pointed to the JSON decoding - again, but this time I was ready: the fail-fast nature of the script now didn't try to do anything else, and I could retry them this morning. So I did.

Interestingly, just as with the tests yesterday, when I run it from the login, it all works just fine. So I fired off the complete nightly run and then set about trying to see what about the crontab setup on these new boxes was messed up and didn't allow the code to run properly. Thankfully, based on yesterday's runs, I know I could get them all done before the start of the day.

So when I started digging, I noticed this in the logs:

  Input length = 1 (Encoding::UndefinedConversionError)
    org.jruby.RubyString:7508:in 'encode'
    json/ext/Parser.java:175:in 'initialize'
    json/ext/Parser.java:151:in 'new'
    ...

so I did a little googling and it brought me back to encodings - what I expected. Which reminded me of this issue I had with reading the seasonality data in the first place. Then I looked at our code, and we are using a standard reader method to get data for both CSV and JSON:

  def self.read_file(filename)
    contents = ''
    what = project_root + '/' + filename
    File.open(what) do |file|
      contents = file.read
    end
    contents
  end

which is all very standard stuff.

What the hits on google were saying was that I needed to think about the encodings, and so I changed the code to read in iso-8859-1 and then transcode it to utf-8:

  def self.read_file(filename)
    contents = ''
    what = project_root + '/' + filename
    File.open(what, 'r:iso-8859-1') do |file|
      contents = file.read
    end
    contents.encode('utf-8', 'iso-8859-1')
  end

Then I saw in another post about encodings in ruby, that I could collapse this into one step:

  def self.read_file(filename)
    contents = ''
    what = project_root + '/' + filename
    File.open(what, 'r:iso-8859-1:utf-8') do |file|
      contents = file.read
    end
    contents
  end

which simplifies the code as well as the understanding: The file is iso-8859-1, but I want utf-8. Perfect! I put this in and I should be good to go.

But the question is really then: Why does the login shell work? After all, if they both failed, that would make sense. But they both don't. That got me looking in the direction of what's defined in the login shell that's not in the crontab pseudo-shell. As soon as I scanned the output, it was clear:

  LANG=en_US.UTF-8

and that explained everything.

The crontab 'shell' doesn't define this, and you can't put it in the crontab file like you can the SHELL and MAILTO variables. So the solution was simple: put it in my main script right after the PATH specification:

  export LANG="en_US.UTF-8"

and all the problems should just go away! That would be nice. I'll have to check when the runs are finished this morning.