Archive for August, 2012

Creating Really Dense Code – The Ruby Way

Friday, August 31st, 2012

Ruby

This afternoon I've written some of the most compact, potentially confusing code I've written in many, many years -- and it's perfect code by a Ruby developer. This is something that may be specific to The Shop, but given that they are such a big Ruby shop, I'm guessing that this is the Ruby Way, and like a lot of the functional code I've seen - completely undocumented. Now that's not to say my code is undocumented, in fact, it's got almost a 1:1 ration of comments to code because of it's compactness, but I've come to realize there's a bit of a blind spot in a solid group of young ruby coders that looks a lot like what I call Homework Problems.

In any case, the code I wrote today was specified by the quantitative analyst in Palo Alto as this:

Group the merchants by the services they offer so that any one merchant in the group shares at least one service with at least one other merchant.

Logically, this means that we can have a series of seemingly unrelated services so long as the group has these pair-wise matchings with at least one service.

If we look at the group of Macy's, the Gap, and a Movie Theatre:
Really Odd Groupings

You'd think there's no way the movie theater fits in the same "group" as the Gap, but because Macy's sells Jeans, and so goes the Gap, and because a movie theatre sells candy, and so does Macy's, then the Gap and a Movie Theatre "belong together" in a group.

I'm not making up these rules, I'm just trying to code them up.

Once we get these groups of merchants, we'll then process them and get some data from them. That's not the interesting part. The interesting part is the grouping, and how to get it.

My first idea was to write a few little methods that I knew I was going to need: one to get the services from a merchant into an array, and another to see if there are any overlap (set intersection) between two merchants:

  # this method returns a clean, unique set of services for the provided OTC.
  def self.get_services(otc)
    (otc['taxonomy'] || []).map { |i| i['service'] }.compact.uniq
  end
 
  # this method returns true if ANY service is shared between the two OTCs. ANY.
  def self.services_overlap?(otc_a, otc_b)
    !(get_services(otc_a) & get_services(otc_b)).empty?
  end

At this point, I knew that these were very "ruby-esque" methods - one line each, so it's got to be "minimal", right? At the same time, I was able to then start to deal with the idea of just finding the right pairs to feed to the second method, and then collecting them into the right groups.

But therein was a real problem. If I just looked at the merchants serially, then the order matters. Imagine the order: Gap, Movies, Macy's. In this case, the Movies would not match the Gap, so there'd be two groups, and then Macy's would match the Gap, and strand the Movies. Bad. So I had to have multiple passes, or I had to think up some other way of looking for the sets.

What happened was that I was scanning the ruby Array docs and noticed the product() method. Interesting, and after about another 10 mins of trying to think up a solution, the ideas came to me: use product() to make pairs of merchants to check, and then add things in and remove duplicates.

Sweet idea!

  def self.group_by_service(otcs)
    # start with the array of groups that we'll be returning to the caller.
    groups = []
    # look at all non-identical pairings in the original list and for each
    # pairing, see if there are ANY common services. If there are, try to find
    # a group to place the PAIR in, if we can't, then make a new group of this
    # pair.
    otcs.product(otcs).map do |pair|
      next if pair[0] == pair[1]
      if services_overlap?(pair[0], pair[1])
        groups << pair unless (groups.map do |g|
          (g << pair).flatten!.uniq!.size unless (g & pair).empty?
        end.compact.reduce(:+) || 0) > 0
      end
    end
    # verify that each OTC is in some kind of group - even alone
    otcs.each do |d|
      groups << [d] unless
          (groups.map { |g| g.include?(d) ? 1 : 0 }.reduce(:+) || 0) > 0
    end
    # return the array of groups to the caller
    groups
  end

It's like an expanded APL to me. Compact code. Chained method calls. More work by the CPU, but less code written by the person. It's not something I'd traditionally write because it's excessively wasteful in the work it's doing, but I'm guessing that it'll be seen as evidence of me "getting it" by the other Ruby guys in the group.

I get it, and in certain instances, I don't think it's wrong. But in a production app that's going to hit speed limitations, have code like this is a killer to performance. There's too much that doesn't need to be done. Yeah, it looks nice, but it's going to put a tax on the machines that shouldn't have to be paid.

I get it… I'm just not sure I think it's a great thing.

Having the Best Tools Really is Nice

Friday, August 31st, 2012

Apple Computers

I'm sitting here this morning smiling quietly to myself after working a while on some problems. I realize I'm smiling because the keyboard is just such a wonderful piece of work - the Apple (small) wireless keyboard and wireless mouse, to be precise. These are simply the best input devices that I've ever used. The keyboard is low to the desktop, responsive, small, and without a cord. The trackpad is large, the gestures wonderfully thought-out. It's amazing what I can do so effortlessly on this machine.

Then there's the machine. I'll admit I'm not a fan of the glossy Apple displays. I never have been. The HP 30" is really the best I've used in a long while. I think Apple could make a nice 30" display, but they don't want to, and there's no making them do something they don't want to do.

But back to the laptops… they are, without a doubt, the best in the industry - and have been for a long while. Really, the only solid competition was the IBM ThinkPad, and IBM sold that to Levono, and the quality has become so very ordinary now. Nothing to write home about.

But great hardware gets out of your way. You stop thinking about how to do something, and focus on the doing. In some cases, it even enables you to see things that you might have not seen - large displays with more code in them really are very beneficial.

So I have to tip my hat to the folks at The Shop that see this as well, and outfit the developers this way. It's really quite amazing.

Living Right Must be Paying Off!

Thursday, August 30th, 2012

Dorey.jpg

Well… here again today I got the hint that maybe I'm living right. Maybe. The manager that was asking for 8 million log statements came by and thanked me for looking into the problem and getting it all solved. Sure, this included a lot of work from others as well, but I was happy that he made the effort.

Really happy. This is nice. Have a nice place to work, and decent people to work with. I just need to be a little firmer about pushing back. I don't think I'm doing anyone any favors by getting upset and grumbling through something.

I get to leave with a smile on my face.

Thinking – A Lost Art

Thursday, August 30th, 2012

cubeLifeView.gif

This might sound harsh, and I'll concede that it's certainly something that's bugged me at more than one job over the years, but it's also why I end up being highly paid and one of the more influential people at those jobs as well. It's really simple.

Thinking has become a lost art.

It's like everything else in life, you have to work at it to get good at it. You can't do it just a bit, and think you're 'good to go' for a while. Nope. Thinking is something you have to really work on, and really apply on a daily basis to keep sharp on the skill.

Why is it so important? Because no matter what you do in life, thinking will make you more efficient and better at it. Period.

You want to manage technical people? You need to think. In something that happened to be recently, a manager asked to look at all the outcomes of a matching. But didn't even stop to do the simple math that what he was asking for was in excess of 8 million matches. Even at a few minutes per match, that's way way too long to really be a useful strategy. Had I followed his advice, we'd have a giant Excel spread sheet that no one could open, and we'd still be going through the data.

Think, people! Think!

Don't just actThink!

I recently read a story about two very famous original Unix architects that were sitting at a workstation (because they were scarce) and when there was a bug, one started hitting the debugger, and the other stopped doing anything -- he started thinking. His approach was that if you understood the system, then debugging was in your head. Thinking was more powerful than doing.

This isn't an invitation to analysis paralysis - that's taking something to an extreme. But you can't shoot first and ask questions later, either. Certainly when you're doing a lot of technical managing.

So let's all practice this a little more, OK?

Thread-Local Variables in Ruby

Thursday, August 30th, 2012

Ruby

Now this is probably not amazing news to long-time Ruby developers, but the simplicity and ease with which it's possible to make thread-local variables in Ruby is simply shocking. The ruby developers just don't know how good they have it. This morning I was looking at a threading problem with the CouchRest CouchDB client, and realized that it's not thread-safe. This isn't really shocking as thread-safety is something I've come to realize is not standard in Ruby libraries.

Still… I was determined to make it work.

What seemed logical was to have multiple database connections - one per thread, and then just have thread-local database connections. As the threads are born, they need a connection, create it, and use it. When they die, the connections are cleaned up automatically. Sweet. Simple.

But I know that dealing with thread-local storage in pthreads is not horrible, but it's certainly not "easy". I dig into the Ruby support for thread-local storage, and it's trivial:

  Thread.current[:foo] = {}

This creates the tagged thread-local variable foo. How simple! This is something that I never expected to see. Never! So why are these Ruby guys having so much trouble with thread-safety? I have no idea.

With the tools I've seen in Ruby, there's really no excuse for why there aren't more thread-safe libraries. All the tools are there - they're just unused. Lazy coders.

Gimp on OS X Goes Native GUI!

Thursday, August 30th, 2012

Gimp.jpg

This morning I read that the latest Gimp on OS X had a native GUI build - very exciting! They didn't abandon the X11 GUI, they simply made a version that looks the same, but has the native Cocoa GUI so that you don't have to be running X11 in order to use it.

This is really neat! I'm not a huge Gimp user any more, as I've found some really great replacements, but it's got a lot of power, and some of the provided scripts are simply amazing. So to see them port the GUI to Cocoa is a really nice feat. I'm very impressed with these guys.

Well done!

Really Nasty Data Archeological Dig

Wednesday, August 29th, 2012

Detective.jpg

I know it needed to be done, and I know someone had to do it, but that doesn't make it any more fun than it already isn't. Digging in the data to find out why we aren't matching up merchants and demand in Philadelphia is no fun at all. It's a lot of data with very little pattern to it, and a whole lot of problems. But that's what I was doing for several hours today. The pain and suffering was really compounded by the complete lack of real thought put into this as we headed into the meeting.

Overall, I was very angry at myself for not pushing back. I should have. I know that now, but it's that blasted work ethic thing that causes me to say "Yes" when I should be saying "Hold on a sec…"

The problem is that we're getting demand and merchants to fulfill that demand, and the assumed "match" here should be very high. Why? Just because. They really have never looked at this and have no idea what it should be, but "instinctually" many think that it should be "very high" - like 90%. So when the first runs came out with it being more like 50%, they wanted to know why. I totally agree.

But where we diverge is in the How?

Once program manager suggested I send a 5000+ line Excel file where the hierarchical JSON data was somehow magically "flattened" to make it easy for anyone to look at the data and determine why the merchants weren't matching. Thankfully, I had the strength of character to say "No" to that.

But that wasn't until after I heard another request to log all 5000+ merchants against all 1500+ demands - yielding more than 8 million log lines. Nope. That's just plain silly.

I wanted to get to the bottom of this to be sure, but I wanted to do it in a way that makes at least a little sense. And looking at 8 million log lines isn't it. So I started building a few CouchDB temporary views and started looking for what wasn't being matched and why.

Turns out there were two major issues: the demand wasn't supplying sufficient 'service' coverage to pin enough merchants, and the zip codes on the merchant data was really pretty horrible. Call me 'Indy' on this - it only took me about 90 mins to find these reasons and document them up for the group. Nice. Clean. Efficient.

Nothing like looking at 8 million log lines.

Interesting JVM Helper

Tuesday, August 28th, 2012

java-logo-thumb.png

My manager at The Shop forwarded something he read today about an interesting little package called drip. It's essentially something that will pre-launch a JVM instance for a set of command options so that repeated calls of the same command will not have the overhead of starting a JVM. This would be ideal for JRuby - if it was supported. Unfortunately, this is a bash script and you need to be able to hook it into the code you're using - or at least replace the java command with drip.

Sadly, JRuby hides the java command, so we can't easily replace it. The JRuby team will have to make it possible with some kind of environment variable, etc. Given that Java on most platforms I've used starts pretty nicely, I'm guessing they are not going to spend a lot of time with this. It's really a bad problem on Mac OS X, but maybe that will be changing in the future. Who knows?

But it's certainly something to hang onto and maybe it'll be useful in the future.

Google Chrome dev 23.0.1246.0 is Out

Tuesday, August 28th, 2012

This morning I noticed that Google Chrome dev 23.0.1246.0 was out, and the release notes are back to being a little more descriptive. This guy a new V8 javascript engine as well as a new cut of WebKit and addresses a few other bugs. Nice to see the notes are back to what they used to be, and that we're still seeing progress. Nice.

Pretty Impressed with jQuery

Monday, August 27th, 2012

JQuery Framework

I've been building a little visualization based on ZingChart, and using jQuery as the general-purpose support library which I've come to realize is simply amazing, and deserving of the acceptance it's received. The code to do a simple AJAX request is amazingly simple:

  $.getJSON(url, function(data) {
    parse_series(data);
    redraw();
  });

where the code in the block is an anonymous function that runs on the return data which will be called data in the block. Very slick. I remember what I had to do to get this working in Chrome a few years ago… it was not pretty.

But there's so much more as well. I think I'm a convert. If I end up doing more javascript pages, I'm going to use jQuery and even if I end up using Google Visualizations, I'll use jQuery to get the data and do some of the nastier stuff.