Great File Encoding Tip for Ruby
This morning I ran into a problem with the ruby re-write of the summary script that I've been working on since late yesterday. The error was occurring on the relatively simple code:
File.open(src) do |line| if line =~ / BEGIN / # … end end
right in the open() method call. The error was cryptic:
summary:48 in 'block in process_pipeline' invalid byte sequence in UTF-8 (ArgumentError) from summary:47" in 'each'
I had to hit google, as it was clear to me there were odd characters in the file, and while I might like to fix that - the key to the previous version was to include the '-a' option to grep to make sure that it looked at the files as binary files. But what would do the trick here?
Turns out there's a StackOverflow answer for that:
File.open(src, 'r:iso-8859-1') do |line| if line =~ / BEGIN / # … end end
which instructs the IO object to read the file with the ISO-8859-1 encoding and that did the trick. No other changes were necessary!
Sweet trick to know.