Parsing a CSV File in Ruby within a JAR

JRuby

Today I ran into a nasty problem with a JRuby app where we're deploying the complete app as a single jar file to the server. That's a really nice idea - one atomic unit to move around, roll-back easily, all the things you'd expect… but it's got at least a few very nasty downsides, and it's got nothing to do with ruby - it's JRuby and how Java handles resources located within the jar as opposed to the filesystem outside the jar.

In short, it's not a seamless transition, and it'd be great if JRuby would handle this in all the File.open code so that we wouldn't have to. But that's probably asking a little much.

Still… to the problem at hand.

The code for reading a CSV file into a map in ruby is pretty simple:

  def self.read_csv(filename)
    res = {}
    CSV.read(filename, :headers => true).each do |rec|
      k = [rec['Size'], rec['Weight'], rec['Height']]
      res[k] = rec
    end
    res
  end

but it assumes that the file is located on the filesystem, and specifically, relative to the current directory of the running ruby VM. This isn't new, it's pretty standard, and very convenient.

But files in jar files aren't in the filesystem. They have to be located and read in as a byte stream:

  require 'java'
 
  def self.read_csv(filename)
    res = {}
 
    # get the contents of the file - no matter where it is
    contents = ''
    if File.exists?(filename)
      File.open(filename) do |file|
        contents = file.read
      end
    else
      # We appear not to have this file - but it's quite possible that
      # the file exists in the deployed jar, and if that's the case,
      # we need to access it in a more java-esque manner. This will be
      # a line at a time, but the results should be the same.
      f = java.lang.Object.new
      stream = f.java_class.class_resource_as_stream('/jar_root/' + filename)
      br = java.io.BufferedReader.new(java.io.InputStreamReader.new(stream))
      while (line = br.read_line())
        contents << "#{line}\n"
      end
      br.close()
    end
 
    # now we can take the contents of the file and process it...
    CSV.parse(contents, :headers => true).each do |rec|
      k = [rec['Size'], rec['Weight'], rec['Height']]
      res[k] = rec
    end
    res
  end

Here, the bulk of the code is about getting the file into a string that we can then parse. It first tries to see if it's on the filesystem, and if that fails, it tries the jar to see if it happens to be there. Unfortunately, it's got to be the full path to the file in the jar, and if you're using a packager that tacks something on the front, you need to be aware of this.

Not horrible, but it was an hour to figure this all out and get it nicely coded up so we didn't have too much redundant code.