Great File Encoding Tip for Ruby

Ruby

This morning I ran into a problem with the ruby re-write of the summary script that I've been working on since late yesterday. The error was occurring on the relatively simple code:

  File.open(src) do |line|
    if line =~ / BEGIN /
    # …
    end
  end

right in the open() method call. The error was cryptic:

  summary:48 in 'block in process_pipeline' invalid byte sequence in UTF-8 (ArgumentError)
      from summary:47" in 'each'

I had to hit google, as it was clear to me there were odd characters in the file, and while I might like to fix that - the key to the previous version was to include the '-a' option to grep to make sure that it looked at the files as binary files. But what would do the trick here?

Turns out there's a StackOverflow answer for that:

  File.open(src, 'r:iso-8859-1') do |line|
    if line =~ / BEGIN /
    # …
    end
  end

which instructs the IO object to read the file with the ISO-8859-1 encoding and that did the trick. No other changes were necessary!

Sweet trick to know.