Archive for August, 2012

Parsing a CSV File in Ruby within a JAR

Thursday, August 9th, 2012

JRuby

Today I ran into a nasty problem with a JRuby app where we're deploying the complete app as a single jar file to the server. That's a really nice idea - one atomic unit to move around, roll-back easily, all the things you'd expect… but it's got at least a few very nasty downsides, and it's got nothing to do with ruby - it's JRuby and how Java handles resources located within the jar as opposed to the filesystem outside the jar.

In short, it's not a seamless transition, and it'd be great if JRuby would handle this in all the File.open code so that we wouldn't have to. But that's probably asking a little much.

Still… to the problem at hand.

The code for reading a CSV file into a map in ruby is pretty simple:

  def self.read_csv(filename)
    res = {}
    CSV.read(filename, :headers => true).each do |rec|
      k = [rec['Size'], rec['Weight'], rec['Height']]
      res[k] = rec
    end
    res
  end

but it assumes that the file is located on the filesystem, and specifically, relative to the current directory of the running ruby VM. This isn't new, it's pretty standard, and very convenient.

But files in jar files aren't in the filesystem. They have to be located and read in as a byte stream:

  require 'java'
 
  def self.read_csv(filename)
    res = {}
 
    # get the contents of the file - no matter where it is
    contents = ''
    if File.exists?(filename)
      File.open(filename) do |file|
        contents = file.read
      end
    else
      # We appear not to have this file - but it's quite possible that
      # the file exists in the deployed jar, and if that's the case,
      # we need to access it in a more java-esque manner. This will be
      # a line at a time, but the results should be the same.
      f = java.lang.Object.new
      stream = f.java_class.class_resource_as_stream('/jar_root/' + filename)
      br = java.io.BufferedReader.new(java.io.InputStreamReader.new(stream))
      while (line = br.read_line())
        contents << "#{line}\n"
      end
      br.close()
    end
 
    # now we can take the contents of the file and process it...
    CSV.parse(contents, :headers => true).each do |rec|
      k = [rec['Size'], rec['Weight'], rec['Height']]
      res[k] = rec
    end
    res
  end

Here, the bulk of the code is about getting the file into a string that we can then parse. It first tries to see if it's on the filesystem, and if that fails, it tries the jar to see if it happens to be there. Unfortunately, it's got to be the full path to the file in the jar, and if you're using a packager that tacks something on the front, you need to be aware of this.

Not horrible, but it was an hour to figure this all out and get it nicely coded up so we didn't have too much redundant code.

Google Chrome dev 22.0.1229.0 is Out

Wednesday, August 8th, 2012

Google Chrome

This morning I noticed that Google Chrome dev 22.0.1229.0 was out, and the release notes are getting to be somewhat of a disappointment. To say they are sparse is an understatement. Take the UI change of the 'wrench' to the 'pancakes':

Chrome Releases: Dev Channel Update

that's a change, and there's got to be a reason for it, but the release notes are totally silent about it. "Read the SVN logs" is hardly an answer - at least it better not be. There should be a lot more information in the SVN logs than this, and these are higher-level issues that need addressing.

Yeah, the quality of the release notes is really slipping on this project.

Building Systems in a Service Orientated Architecture Way

Tuesday, August 7th, 2012

cubeLifeView.gif

I can really appreciate that building things in a large-scale web system means decentralization. You need to have services, and those services have to be well-defined and walled off so that changes in one service are easily adopted by another, but don't break existing apps. In short - I get it. But that doesn't mean it's easy.

The problem I'm running into today is that in order to accomplish my goals, I need other people to add things to their services so that I can get the data from them and then process it. In the past, I haven't liked this either, but it's typically been the case that I simply figured out how to update that code, and made the changes myself. Alternatively, I just didn't have the external dependencies as management wanted me to build something from the ground up.

It's not that I think these people are bad, or can't do the work. It's that they are required to make progress. It's the difference in working in a large, disparate team, and working in a small, focused team. I like the latter because it means you do very little sitting around, and a whole lot of getting stuff done.

But the flip side is that you're the key man, and there are very few people that understand the project. Like I said, I can see the flip-side.

Today has just been one of those days where I'm doing a lot of waiting for people, and I'd like to be doing a lot of coding. After all, I'm a coder, not a professional waiter.

The PickAxe Book’s Font Choice

Tuesday, August 7th, 2012

Books and Stuff

This morning I've finally gotten tired of the font choice for the Pragmatic Programmer's Programming Ruby 1.9 book. It's a lot smaller and not nearly as readable as every other Pragmatic Programmer's book I've got. I mean really bad. First, I read all my tech books on my MacBook Pro. I have a size I like to give them on the screen. I know the font is fixed on a PDF, but still… I find this very readable:

pp_cocoaProg.pdf (page 104 of 454)

yet here's the PickAxe book at even more screen real estate:

PickAxe Book

To me, there's no contest.

So I wrote to the PP guys - on the off-chance that they might be interested in my feedback:

Guys,

I love your books. They are by far the "go to" books for learning anything I need for a new project or job. I have several. Wonderful stuff.

However, I have a slight bone to pick with you on the font choice for the Ruby 1.9 book.

What's up with the font? It's not the same font that I see on every single one of my other PragProg books. I like the "other" font. It's easy to read on computer screens as a PDF, and it's very nice and legible at a somewhat reduced size.

But the font for the Ruby 1.9 book is almost the exact opposite. Very hard to read. Nowhere near as clear.

It may be stylish, but as for me - give me back the old font and layout with the blue borders and the style that made PragProg the best tech books around.

I know it's just one voice, but hey... if you don't hear it from your friends, then others will just talk about you behind your back, right?

Dave Thomas wrote back:

As the person who picked the font, let me say I really appreciate the feedback.

The choice of font was difficult. The PickAxe had grown from a hefy 450 pages to a massive 950, and I really wanted to stop the trend. So I looked for ways to thin it down. One choice I made was to go for a font which was a little skinnier and a little more open, which would let me sometimes squeeze just a few more lines on the page. This one difference let me save about 60 pages overall. I also changed the layout of the standard library section and saved another 40 or so.

It's always a tricky compromise, but I hope that the marginal decrease in legibility is offset by the many thousands of pages that won't be printed 🙂

I can see his point - it would be a monster printed book, and a lot of people buy his books like this. I wrote back saying so, and he suggested reading it as ePub or mobi where I could set the font as I pleased. The problem with that is that there are no good ePub or mobi readers for the Mac that preserve the formatted text of the code samples. None. So I can read PDF that at least looks right, or nothing.

Not happy with the alternatives, but I don't have a lot of say about it either. It is what it is, and it's up to me to just suck it up and deal with it.

Shucks.

GitHub Continues to Amaze

Monday, August 6th, 2012

GitHub Source Hosting

Today I got the latest news from GitHub about the new Notifications and Stars features, and how they work together to make it much easier for you to tag your interest in a repo without getting all the notifications from every one. Very nice.

These guys are really quite impressive. I'm glad to have code there - both open source and private. It's an amazing place that's coordinating all the source control, issue tracking, and documentation functions of a project under one hood. It's really quite impressive.

And to see that they are continuing to improve it with new features and better style is just amazing. I can't remember the last time I was this impressed with Jira. OK… I have never been that impressed with Jira.

Great work for an amazing service.

It’s Nice to be Busy

Monday, August 6th, 2012

cubeLifeView.gif

It really is nice to be busy. Maybe this is feeding my finance problem, and I'd be better off de-toxing all the way, but it really is nice to be responding to requests quickly and turning around answers to folks quickly and efficiently. It gives me a sense of accomplishment, and that's really nice when you're still The New Guy at a place. Feeling like you make a difference, like you matter - that's nice stuff.

It's all about how we feel about ourselves, isn't it? No matter what you're doing, where you're doing it… if you feel good about yourself it's meaningful work. Good enough.

Restarting the Setup Process for a new Mac

Saturday, August 4th, 2012

Apple Computers

Today I had a nasty scare - I was trying to migrate my son's old MacBook to a new MacBook Pro we just got him at the Apple store. The migration was going over WiFi and was going to take 14 hours. I hooked up ethernet cables, but it wasn't smart enough to detect the new and faster transport. With each passing minute, the time to finish was going up! This was bad and getting worse.

So I decided to kill it by shutting off the old machine and then the new. I then restarted the process, but decided moving my account wasn't necessary - Huge Mistake! Mine was the Administrator account. So when I got done transferring his, mine was in an incomplete state - all messed up, and his wasn't an admin! We were sunk.

No install disk. No way around it. Well and truly hosed.

Then I saw this: How to restart the setup process. If I could get that going again, and transfer my account this time, we'd be set. So here's what I had to do: Restart the Mac holding down Cmd-s to get into single-user mode, and type:

  $ /sbin/mount -uw /
  $ rm /var/db/.AppleSetupDone
  $ exit

at this point, I got to restart the setup process and I could re-transfer my account. Once that was done, I could make his account an admin and we're safe.

That was a close one. No fun at all.

Adjustment isn’t Always an Easy Process

Friday, August 3rd, 2012

cubeLifeView.gif

I've been at The Shop for a couple weeks now. I barely know how to find the bathroom (joking), and I'm finding that some days the process of acclimation is easier than others. Today has been one of those days that was harder than average. I was talking to a good friend that's still in finance and he said something that so incredibly true:

The one thing I like about finance is that it teaches urgency. Now, the average developer interprets that to mean "do the bare minimum and move on", which sucks. But for those of us that care, we learn to write great code in a short period of time. We truly deliver. My fear, in a culture like [The Shop], is that after a while, people would lose their "edge", and revert to their lackadaisical way of life.

and then went on to say:

Or, you will emerge so much stronger than the rest of them, you'll own them.

This is exactly what's happened in the jobs I've had in the past even in finance. It's so easy to work 12+ hr days, write 2000-4000 lines of code a day, and take about two projects to totally change management's impression of me from the "new guy" to the "shining star". Period. I've done it so many times, it's almost a formula to me.

But I didn't want to do it here because I didn't like how the story seems to unfold. I start to isolate myself - and management agrees: Keep him locked away and producing code! It's in their best interests, and I didn't mind the peace and quiet. Plus, it meant that I didn't have to deal with a lot of people monkeying around with my code base.

Now I don't mind real help, but if you're going to do something that's not in the design of the app - don't do it. Ask. I've written about this more times than I can remember. So I decided that this time I'm going to work hard at fitting in.

I'm not so sure I'm going to be successful.

Case in point this week: let's call it The Case of Singletons and Thread Safety.

The Case of Singletons and Thread Safety

We have a process in the current application that needs to gather some statistics on the data running through it. Like a logger, it makes good design sense to have a singleton that does this. In the same way that you need one thing controlling the output on a logger to make the output look reasonable and make sense, it makes a lot of sense to have one aggregator, and then have it be responsible for serving up those aggregated values at any time.

I suppose that it's possible to have an aggregator that is simply applied to a list of things, and gathers data from each as it visits them, but that's not the same. That visitor pattern requires that all members be in memory at once, or that there's some reference to the ones that have been processed, and those that haven't - so you don't process the same one twice.

In either case, it's a lot more difficult design to understand. While it may have fewer lines, I've found that there's a significant point of diminishing returns on the simplicity angle - and it's possible to try to simplify something too much and end up with something that's small, but far from simple.

So we have this singleton. I simply put in a simple mutex at the proper points - three, I'm pretty sure, and that would provide all the thread safety we needed. They were even scoped locks, so there was no chance of deadlocks or any other issues arising from the use of the muteness. It's simple.

But not from Steve's point of view.

Steve is a guy on the team that's a nice guy. Fun-loving and quick with a joke, he's a good person to be around to keep things from becoming too serious. At the same time he's like many coding bigots I've known in that there is only one real language, and all the rest are junk, and there's only one real OS, and the rest are junk, and so on… It's sad to see someone with such great potential limit themselves so totally in life. The minute I heard him talk about technology, he marginalized himself to me as I know he's never going to be able to think outside the box that he's voluntarily placed himself in.

So Steve didn't like muteness. Thought they made the code more difficult to read.

Now recall there are only three uses of this mutex in this one class. Three. And the entire class is less than 50 lines - including whitespace. So it's not like this is going to take a lot to understand - and they are ruby muteness, well documented in the specs.

But that was too much for Steve.

So Steve and Fred spent two on replacing the mutex with atomic references. When they started down this road, I advised them why it was the way it was, and that it was a good solution. I advised them not to mess with atomics unless it's necessary because they are difficult to get right. But they didn't listen to me.

I even asked them not to do it.

But now we have atomics in the code.

I take that back… we have use of the Atomic Reference gem in the code. Within that code, as I've read, they use muteness to control access because you can't really have atomic operations in a reference counting language - look at ARC in ObjC 2.0 - can't be done. You can't do the compare and swap at the same time as handling the reference count. Period. So you can fake it, or use muteness, but you can't do it like you can when you don't have a reference counting VM.

So what's the upshot of all this? Well… I had the mutex in there to control adding to the singleton. If they made the container atomic it has to be atomic with respect to it's contents and that's a lot different that having atomic references to the container. My belief is that the Atomic gem is doing just what I was doing, but in the gem, if it's doing it right. If not, then it's controlling access to the container and isn't properly controlling inserts and removals from the container.

In short - they have in all cases a worse implementation. But they are happy. Why? Because they got to remove three lines of code in the 75 line class.

If I were the manager, I'd have a serious heart-to-heart with them about wasting time.

But I'm not the manager. I'm trying to fit in. But this kind of stuff is very hard for me because I see them making mistakes I've told them not to. They aren't listening. I can't save them, or their project. I can only step in when it's not working to say "I told you about this, and you didn't listen. Change it back and try it again. Next time, shut up and listen!"

But I won't. Because I want to fit in.

This adjustment is hard today. I had to deal with this. I had to deal with refactoring that wasn't done right. I mean really… if you're going to refactor, then bloody well do it RIGHT! You don't have classes that are 5 lines in total. Two lines are the definition! You've got a 5 line class?! That's got to be a method or function somewhere else - I guarantee it.

Like I said… it's hard. I don't know how long I'll last. I want to last. I really do. But I don't know how long I can last in a place where there are guys like this.

Properly Recording Interactions with VCR

Friday, August 3rd, 2012

Ruby

This morning I found a problem with the Ruby gem VCR. It turns out that if it hits a service that reports it's data as ASCII encoded, but actually sends UTF-8, then the data will be stored by VCR as ASCII, but will be unable to be read out of the cassette. Very nasty. The solution is to force VCR to record the actual bytes and base64 encode them. This is easily done with the code:

  require 'vcr'
 
  VCR.configure do |c|
    c.cassette_library_dir = 'spec/cassettes'
    c.hook_into :webmock
    c.default_cassette_options = {
      :record => (ENV["VCR_RECORD"] ? :new_episodes : :none),
      :match_requests_on => [:method, :url, :path, :host, :body]
    }
    c.ignore_localhost = true
    c.allow_http_connections_when_no_cassette = true
    c.preserve_exact_body_bytes do |http_message|
      http_message.body.encoding.name == 'ASCII-8BIT' ||
      !http_message.body.valid_encoding?
    done
  done

The code is simple - if the message body is reported from the service as ASCII-8BIT, then don't trust it. Likewise, if the body has no reported valid encoding.

This is a nice, conservative way to ensure that the cassettes get written in a way that it's ensured that we'll be able to read it back out. While it might be nice to have a human-readable version, it's not worth the problems of badly behaved services that say they are ASCII and really return UTF-8.

The Endless Loop of Refactoring

Thursday, August 2nd, 2012

GeneralDev.jpg

I'm a big fan of refactoring - heck rewriting, code. It's good because the first time you write something, you're going to make mistakes, and the more you re-work it, the better it's going to be. Or so the theory goes… and I tend to agree with that theory… so long as some kind of limits are imposed.

I'm not talking about being unreasonable here, either. If you are feeling that a class or method needs to be changed in order to make it more readable, more maintainable, or fix a serious performance issue, then by all means - do it. But if you're changing 10 lines for 10 different lines… or doing meta programming to try and make things neater or cooler, then I want to stop you right there and say What's the benefit of this change?

Is it really better? More readable? More maintainable? Or is it that you thought of another way to write the same thing, and wanted to write it just because you could? I'm running into quite a bit of that today, and because I'm the new guy, and don't want to rock the boat, I sat back and learned a lot about how to write 10 lines of ruby code about six different ways. I learned a lot, but the code didn't benefit one little bit.

The developers discussing this - there were three of us on this at that time, were talking about the relative merits of the indentation style, and the warranted differentiation of the methods of generating the methods dynamically in ruby. All might be considered valid issues, but none of this makes the code better.

None.

So after we got all this checked in and done, we started talking a little and I just brought up the point that in the past, this would have never been allowed in the places I've been. There is just too much to do to allow three high-powered developers to be tied up for more than an hour for 10 lines of code that were fine before anyone got involved. It would have been seen as a massive waste of time.

But in truth, there was benefit. The methods are smaller by a bit. I learned a lot. And the calling arguments are now more general. This is all nice, but it certainly doesn't compare favorably to the cost. But maybe that's the difference outside of Finance. Maybe the sustainability is more reasonable here. Maybe it's OK to do this and "relax", and allow everyone to recharge from time to time.

It's certainly different.