Archive for the ‘Coding’ Category

Slugging Through a Lot of Little Updates

Tuesday, September 4th, 2012

GeneralDev.jpg

Today has been a big day of a lot of little things. I've got five post-it notes on my desk - each filled with little things that need to be updated in the code. There are new rules for how to handle new merchants, more rules about aggregations, different rules about when to ignore merchants… all needed to be in the code as soon as possible, and because no one thing was that horrible to do, it was possible to get all the changes in today.

Yet, while trying to get things done, taking more requests is a little frustrating. No… it's quite frustrating. But I was able to get through it all without getting really upset, which is a nice win for me.

In all, it has been a really nice day - there are a few more things I need to do, but it was a good day.

Google Chrome dev 23.0.1251.2 is Out

Tuesday, September 4th, 2012

This morning I noticed that Google Chrome dev 23.0.1251.2 was out with a few nice fixes for crashing bugs. I haven't noticed them, but I'm not hammering on it with Javascript like a lot of the other folks are. Still, it's nice to see the decent release notes, and the improvement in Chrome continue.

Creating Really Dense Code – The Ruby Way

Friday, August 31st, 2012

Ruby

This afternoon I've written some of the most compact, potentially confusing code I've written in many, many years -- and it's perfect code by a Ruby developer. This is something that may be specific to The Shop, but given that they are such a big Ruby shop, I'm guessing that this is the Ruby Way, and like a lot of the functional code I've seen - completely undocumented. Now that's not to say my code is undocumented, in fact, it's got almost a 1:1 ration of comments to code because of it's compactness, but I've come to realize there's a bit of a blind spot in a solid group of young ruby coders that looks a lot like what I call Homework Problems.

In any case, the code I wrote today was specified by the quantitative analyst in Palo Alto as this:

Group the merchants by the services they offer so that any one merchant in the group shares at least one service with at least one other merchant.

Logically, this means that we can have a series of seemingly unrelated services so long as the group has these pair-wise matchings with at least one service.

If we look at the group of Macy's, the Gap, and a Movie Theatre:
Really Odd Groupings

You'd think there's no way the movie theater fits in the same "group" as the Gap, but because Macy's sells Jeans, and so goes the Gap, and because a movie theatre sells candy, and so does Macy's, then the Gap and a Movie Theatre "belong together" in a group.

I'm not making up these rules, I'm just trying to code them up.

Once we get these groups of merchants, we'll then process them and get some data from them. That's not the interesting part. The interesting part is the grouping, and how to get it.

My first idea was to write a few little methods that I knew I was going to need: one to get the services from a merchant into an array, and another to see if there are any overlap (set intersection) between two merchants:

  # this method returns a clean, unique set of services for the provided OTC.
  def self.get_services(otc)
    (otc['taxonomy'] || []).map { |i| i['service'] }.compact.uniq
  end
 
  # this method returns true if ANY service is shared between the two OTCs. ANY.
  def self.services_overlap?(otc_a, otc_b)
    !(get_services(otc_a) & get_services(otc_b)).empty?
  end

At this point, I knew that these were very "ruby-esque" methods - one line each, so it's got to be "minimal", right? At the same time, I was able to then start to deal with the idea of just finding the right pairs to feed to the second method, and then collecting them into the right groups.

But therein was a real problem. If I just looked at the merchants serially, then the order matters. Imagine the order: Gap, Movies, Macy's. In this case, the Movies would not match the Gap, so there'd be two groups, and then Macy's would match the Gap, and strand the Movies. Bad. So I had to have multiple passes, or I had to think up some other way of looking for the sets.

What happened was that I was scanning the ruby Array docs and noticed the product() method. Interesting, and after about another 10 mins of trying to think up a solution, the ideas came to me: use product() to make pairs of merchants to check, and then add things in and remove duplicates.

Sweet idea!

  def self.group_by_service(otcs)
    # start with the array of groups that we'll be returning to the caller.
    groups = []
    # look at all non-identical pairings in the original list and for each
    # pairing, see if there are ANY common services. If there are, try to find
    # a group to place the PAIR in, if we can't, then make a new group of this
    # pair.
    otcs.product(otcs).map do |pair|
      next if pair[0] == pair[1]
      if services_overlap?(pair[0], pair[1])
        groups << pair unless (groups.map do |g|
          (g << pair).flatten!.uniq!.size unless (g & pair).empty?
        end.compact.reduce(:+) || 0) > 0
      end
    end
    # verify that each OTC is in some kind of group - even alone
    otcs.each do |d|
      groups << [d] unless
          (groups.map { |g| g.include?(d) ? 1 : 0 }.reduce(:+) || 0) > 0
    end
    # return the array of groups to the caller
    groups
  end

It's like an expanded APL to me. Compact code. Chained method calls. More work by the CPU, but less code written by the person. It's not something I'd traditionally write because it's excessively wasteful in the work it's doing, but I'm guessing that it'll be seen as evidence of me "getting it" by the other Ruby guys in the group.

I get it, and in certain instances, I don't think it's wrong. But in a production app that's going to hit speed limitations, have code like this is a killer to performance. There's too much that doesn't need to be done. Yeah, it looks nice, but it's going to put a tax on the machines that shouldn't have to be paid.

I get it… I'm just not sure I think it's a great thing.

Having the Best Tools Really is Nice

Friday, August 31st, 2012

Apple Computers

I'm sitting here this morning smiling quietly to myself after working a while on some problems. I realize I'm smiling because the keyboard is just such a wonderful piece of work - the Apple (small) wireless keyboard and wireless mouse, to be precise. These are simply the best input devices that I've ever used. The keyboard is low to the desktop, responsive, small, and without a cord. The trackpad is large, the gestures wonderfully thought-out. It's amazing what I can do so effortlessly on this machine.

Then there's the machine. I'll admit I'm not a fan of the glossy Apple displays. I never have been. The HP 30" is really the best I've used in a long while. I think Apple could make a nice 30" display, but they don't want to, and there's no making them do something they don't want to do.

But back to the laptops… they are, without a doubt, the best in the industry - and have been for a long while. Really, the only solid competition was the IBM ThinkPad, and IBM sold that to Levono, and the quality has become so very ordinary now. Nothing to write home about.

But great hardware gets out of your way. You stop thinking about how to do something, and focus on the doing. In some cases, it even enables you to see things that you might have not seen - large displays with more code in them really are very beneficial.

So I have to tip my hat to the folks at The Shop that see this as well, and outfit the developers this way. It's really quite amazing.

Thread-Local Variables in Ruby

Thursday, August 30th, 2012

Ruby

Now this is probably not amazing news to long-time Ruby developers, but the simplicity and ease with which it's possible to make thread-local variables in Ruby is simply shocking. The ruby developers just don't know how good they have it. This morning I was looking at a threading problem with the CouchRest CouchDB client, and realized that it's not thread-safe. This isn't really shocking as thread-safety is something I've come to realize is not standard in Ruby libraries.

Still… I was determined to make it work.

What seemed logical was to have multiple database connections - one per thread, and then just have thread-local database connections. As the threads are born, they need a connection, create it, and use it. When they die, the connections are cleaned up automatically. Sweet. Simple.

But I know that dealing with thread-local storage in pthreads is not horrible, but it's certainly not "easy". I dig into the Ruby support for thread-local storage, and it's trivial:

  Thread.current[:foo] = {}

This creates the tagged thread-local variable foo. How simple! This is something that I never expected to see. Never! So why are these Ruby guys having so much trouble with thread-safety? I have no idea.

With the tools I've seen in Ruby, there's really no excuse for why there aren't more thread-safe libraries. All the tools are there - they're just unused. Lazy coders.

Really Nasty Data Archeological Dig

Wednesday, August 29th, 2012

Detective.jpg

I know it needed to be done, and I know someone had to do it, but that doesn't make it any more fun than it already isn't. Digging in the data to find out why we aren't matching up merchants and demand in Philadelphia is no fun at all. It's a lot of data with very little pattern to it, and a whole lot of problems. But that's what I was doing for several hours today. The pain and suffering was really compounded by the complete lack of real thought put into this as we headed into the meeting.

Overall, I was very angry at myself for not pushing back. I should have. I know that now, but it's that blasted work ethic thing that causes me to say "Yes" when I should be saying "Hold on a sec…"

The problem is that we're getting demand and merchants to fulfill that demand, and the assumed "match" here should be very high. Why? Just because. They really have never looked at this and have no idea what it should be, but "instinctually" many think that it should be "very high" - like 90%. So when the first runs came out with it being more like 50%, they wanted to know why. I totally agree.

But where we diverge is in the How?

Once program manager suggested I send a 5000+ line Excel file where the hierarchical JSON data was somehow magically "flattened" to make it easy for anyone to look at the data and determine why the merchants weren't matching. Thankfully, I had the strength of character to say "No" to that.

But that wasn't until after I heard another request to log all 5000+ merchants against all 1500+ demands - yielding more than 8 million log lines. Nope. That's just plain silly.

I wanted to get to the bottom of this to be sure, but I wanted to do it in a way that makes at least a little sense. And looking at 8 million log lines isn't it. So I started building a few CouchDB temporary views and started looking for what wasn't being matched and why.

Turns out there were two major issues: the demand wasn't supplying sufficient 'service' coverage to pin enough merchants, and the zip codes on the merchant data was really pretty horrible. Call me 'Indy' on this - it only took me about 90 mins to find these reasons and document them up for the group. Nice. Clean. Efficient.

Nothing like looking at 8 million log lines.

Interesting JVM Helper

Tuesday, August 28th, 2012

java-logo-thumb.png

My manager at The Shop forwarded something he read today about an interesting little package called drip. It's essentially something that will pre-launch a JVM instance for a set of command options so that repeated calls of the same command will not have the overhead of starting a JVM. This would be ideal for JRuby - if it was supported. Unfortunately, this is a bash script and you need to be able to hook it into the code you're using - or at least replace the java command with drip.

Sadly, JRuby hides the java command, so we can't easily replace it. The JRuby team will have to make it possible with some kind of environment variable, etc. Given that Java on most platforms I've used starts pretty nicely, I'm guessing they are not going to spend a lot of time with this. It's really a bad problem on Mac OS X, but maybe that will be changing in the future. Who knows?

But it's certainly something to hang onto and maybe it'll be useful in the future.

Google Chrome dev 23.0.1246.0 is Out

Tuesday, August 28th, 2012

This morning I noticed that Google Chrome dev 23.0.1246.0 was out, and the release notes are back to being a little more descriptive. This guy a new V8 javascript engine as well as a new cut of WebKit and addresses a few other bugs. Nice to see the notes are back to what they used to be, and that we're still seeing progress. Nice.

Pretty Impressed with jQuery

Monday, August 27th, 2012

JQuery Framework

I've been building a little visualization based on ZingChart, and using jQuery as the general-purpose support library which I've come to realize is simply amazing, and deserving of the acceptance it's received. The code to do a simple AJAX request is amazingly simple:

  $.getJSON(url, function(data) {
    parse_series(data);
    redraw();
  });

where the code in the block is an anonymous function that runs on the return data which will be called data in the block. Very slick. I remember what I had to do to get this working in Chrome a few years ago… it was not pretty.

But there's so much more as well. I think I'm a convert. If I end up doing more javascript pages, I'm going to use jQuery and even if I end up using Google Visualizations, I'll use jQuery to get the data and do some of the nastier stuff.

Working with ZingChart

Monday, August 27th, 2012

WebDevel.jpg

Today I've started working with a new javascript charting library called ZingChart. It looks pretty complete, and certainly has a lot of charting styles, plus it's touting it's speed with large data sets, so that's nice. But as with anything like this, the learning curve is steep because the graph has to be configured in javascript, and it's always a lot of work to know not only How to do something, but also What's possible to be done.

Compounding this is that the documentation and the examples have contradictory information - I mean not even close! I spent several hours on trying to just get a nice pie chart going. But in the end, all the knobs are there and we don't have to worry that there's something we simply cannot do.

As an interesting note, they include the complete jQuery library in their 'resources' directory. It's something I've read a lot about, and it was available when I was doing my last web development work a few lifetimes ago, but we didn't use it because the Google Visualization Toolkit was more than enough for what we needed, and the need just didn't arise.

But it's interesting that it's all there. Very nice looking graphs.

Odd that they don't have a table. Guess I'll have to fall back to Google Visualizations for that.