Archive for the ‘Coding’ Category

Creating Software Plumbers

Wednesday, September 5th, 2012

I just read this tweet this morning:

Twitter / davehoover: Young people: consider ...

which leads to this article advocating that young people look to entering an apprenticeship program and not continue school. It says, in part:

Universities are the typical place that established businesses expect to find these high-potential beginners. While many software developers finish college with a good education, they’re often burned out, deep in debt, and understandably eager to cash in on their hard work. Apprentices, on the other hand, inject enthusiasm, hard work, and a thirst for knowledge into your teams. They will consistently launch from your apprenticeship program with context, momentum, and loyalty to your organization.

While I can understand the point of the article, and you should read it to get that it's not saying people shouldn't go to higher education, it's saying that you, as a business owner, can capitalize on the cost of higher education, and get those people that might go to college and get them into the workforce.

But is that what we want to have happen, as an industry? I don't think so. I think it's robbing the future to staff the present, and that's a mistake. A big one.

I'm biased. I've got the higher education and the advanced degrees, and I think they are the right thing to do. But even if you discount my position, and do what the author suggests, aren't we just creating a bunch of Software Plumbers? They'll know what they see, and will be able to work with it, but their understanding of how to solve new and unusual problems will be very limited. Oh sure, you'll have a few percent that naturally think outside the box, but their exposure to new things and new ideas will be incredibly limited.

This is the exact purpose of those liberal arts classes for engineers - to broaden a student's horizons. If we just allow people to learn what we want them to learn, aren't we really just forcing ourselves to re-train them when we want to change technologies? Of course we are.

While there are times to have an apprenticeship program - for those that can't make it into college, I think it'll be overused and draw the real future of the profession into one where only a few can really think creatively. And that would be very bad.

Logging all Incomplete Processing

Wednesday, September 5th, 2012

GeneralDev.jpg

This morning I decided that it'd be nice to have a complete list of all the merchants that didn't successfully complete their processing. Since we are now processing everything - a recent change to the code to make sure that we know exactly every single merchant is getting completely processed, we can now look at each merchant and make sure that they got through the critical processing phases. If it didn't "pick up" the right data, then we can assume that it didn't get to that point. The point of this is that we can then be sure that every merchant completed processing.

I log this, and write them to CouchDB, so we can keep a complete record of all the issues, and then updated the summary script to list the number of incompletely processed merchants so we can watch them over time.

Nice. This is starting to really get close to verification that all was done as it was supposed to have been done.

Google Chrome dev 23.0.1255.0 is Out

Wednesday, September 5th, 2012

It didn't take long - just a few days, and now Google Chrome dev 23.0.1255.0 is out with a nice array of fixes for crashing bugs - including a video problem on retina MacBook Pros. There are a few things about the security of apps in the browser, which I don't use, but I'm sure there are quite a few Angry Birds fans out there.

Slugging Through a Lot of Little Updates

Tuesday, September 4th, 2012

GeneralDev.jpg

Today has been a big day of a lot of little things. I've got five post-it notes on my desk - each filled with little things that need to be updated in the code. There are new rules for how to handle new merchants, more rules about aggregations, different rules about when to ignore merchants… all needed to be in the code as soon as possible, and because no one thing was that horrible to do, it was possible to get all the changes in today.

Yet, while trying to get things done, taking more requests is a little frustrating. No… it's quite frustrating. But I was able to get through it all without getting really upset, which is a nice win for me.

In all, it has been a really nice day - there are a few more things I need to do, but it was a good day.

Google Chrome dev 23.0.1251.2 is Out

Tuesday, September 4th, 2012

This morning I noticed that Google Chrome dev 23.0.1251.2 was out with a few nice fixes for crashing bugs. I haven't noticed them, but I'm not hammering on it with Javascript like a lot of the other folks are. Still, it's nice to see the decent release notes, and the improvement in Chrome continue.

Creating Really Dense Code – The Ruby Way

Friday, August 31st, 2012

Ruby

This afternoon I've written some of the most compact, potentially confusing code I've written in many, many years -- and it's perfect code by a Ruby developer. This is something that may be specific to The Shop, but given that they are such a big Ruby shop, I'm guessing that this is the Ruby Way, and like a lot of the functional code I've seen - completely undocumented. Now that's not to say my code is undocumented, in fact, it's got almost a 1:1 ration of comments to code because of it's compactness, but I've come to realize there's a bit of a blind spot in a solid group of young ruby coders that looks a lot like what I call Homework Problems.

In any case, the code I wrote today was specified by the quantitative analyst in Palo Alto as this:

Group the merchants by the services they offer so that any one merchant in the group shares at least one service with at least one other merchant.

Logically, this means that we can have a series of seemingly unrelated services so long as the group has these pair-wise matchings with at least one service.

If we look at the group of Macy's, the Gap, and a Movie Theatre:
Really Odd Groupings

You'd think there's no way the movie theater fits in the same "group" as the Gap, but because Macy's sells Jeans, and so goes the Gap, and because a movie theatre sells candy, and so does Macy's, then the Gap and a Movie Theatre "belong together" in a group.

I'm not making up these rules, I'm just trying to code them up.

Once we get these groups of merchants, we'll then process them and get some data from them. That's not the interesting part. The interesting part is the grouping, and how to get it.

My first idea was to write a few little methods that I knew I was going to need: one to get the services from a merchant into an array, and another to see if there are any overlap (set intersection) between two merchants:

  # this method returns a clean, unique set of services for the provided OTC.
  def self.get_services(otc)
    (otc['taxonomy'] || []).map { |i| i['service'] }.compact.uniq
  end
 
  # this method returns true if ANY service is shared between the two OTCs. ANY.
  def self.services_overlap?(otc_a, otc_b)
    !(get_services(otc_a) & get_services(otc_b)).empty?
  end

At this point, I knew that these were very "ruby-esque" methods - one line each, so it's got to be "minimal", right? At the same time, I was able to then start to deal with the idea of just finding the right pairs to feed to the second method, and then collecting them into the right groups.

But therein was a real problem. If I just looked at the merchants serially, then the order matters. Imagine the order: Gap, Movies, Macy's. In this case, the Movies would not match the Gap, so there'd be two groups, and then Macy's would match the Gap, and strand the Movies. Bad. So I had to have multiple passes, or I had to think up some other way of looking for the sets.

What happened was that I was scanning the ruby Array docs and noticed the product() method. Interesting, and after about another 10 mins of trying to think up a solution, the ideas came to me: use product() to make pairs of merchants to check, and then add things in and remove duplicates.

Sweet idea!

  def self.group_by_service(otcs)
    # start with the array of groups that we'll be returning to the caller.
    groups = []
    # look at all non-identical pairings in the original list and for each
    # pairing, see if there are ANY common services. If there are, try to find
    # a group to place the PAIR in, if we can't, then make a new group of this
    # pair.
    otcs.product(otcs).map do |pair|
      next if pair[0] == pair[1]
      if services_overlap?(pair[0], pair[1])
        groups << pair unless (groups.map do |g|
          (g << pair).flatten!.uniq!.size unless (g & pair).empty?
        end.compact.reduce(:+) || 0) > 0
      end
    end
    # verify that each OTC is in some kind of group - even alone
    otcs.each do |d|
      groups << [d] unless
          (groups.map { |g| g.include?(d) ? 1 : 0 }.reduce(:+) || 0) > 0
    end
    # return the array of groups to the caller
    groups
  end

It's like an expanded APL to me. Compact code. Chained method calls. More work by the CPU, but less code written by the person. It's not something I'd traditionally write because it's excessively wasteful in the work it's doing, but I'm guessing that it'll be seen as evidence of me "getting it" by the other Ruby guys in the group.

I get it, and in certain instances, I don't think it's wrong. But in a production app that's going to hit speed limitations, have code like this is a killer to performance. There's too much that doesn't need to be done. Yeah, it looks nice, but it's going to put a tax on the machines that shouldn't have to be paid.

I get it… I'm just not sure I think it's a great thing.

Having the Best Tools Really is Nice

Friday, August 31st, 2012

Apple Computers

I'm sitting here this morning smiling quietly to myself after working a while on some problems. I realize I'm smiling because the keyboard is just such a wonderful piece of work - the Apple (small) wireless keyboard and wireless mouse, to be precise. These are simply the best input devices that I've ever used. The keyboard is low to the desktop, responsive, small, and without a cord. The trackpad is large, the gestures wonderfully thought-out. It's amazing what I can do so effortlessly on this machine.

Then there's the machine. I'll admit I'm not a fan of the glossy Apple displays. I never have been. The HP 30" is really the best I've used in a long while. I think Apple could make a nice 30" display, but they don't want to, and there's no making them do something they don't want to do.

But back to the laptops… they are, without a doubt, the best in the industry - and have been for a long while. Really, the only solid competition was the IBM ThinkPad, and IBM sold that to Levono, and the quality has become so very ordinary now. Nothing to write home about.

But great hardware gets out of your way. You stop thinking about how to do something, and focus on the doing. In some cases, it even enables you to see things that you might have not seen - large displays with more code in them really are very beneficial.

So I have to tip my hat to the folks at The Shop that see this as well, and outfit the developers this way. It's really quite amazing.

Thread-Local Variables in Ruby

Thursday, August 30th, 2012

Ruby

Now this is probably not amazing news to long-time Ruby developers, but the simplicity and ease with which it's possible to make thread-local variables in Ruby is simply shocking. The ruby developers just don't know how good they have it. This morning I was looking at a threading problem with the CouchRest CouchDB client, and realized that it's not thread-safe. This isn't really shocking as thread-safety is something I've come to realize is not standard in Ruby libraries.

Still… I was determined to make it work.

What seemed logical was to have multiple database connections - one per thread, and then just have thread-local database connections. As the threads are born, they need a connection, create it, and use it. When they die, the connections are cleaned up automatically. Sweet. Simple.

But I know that dealing with thread-local storage in pthreads is not horrible, but it's certainly not "easy". I dig into the Ruby support for thread-local storage, and it's trivial:

  Thread.current[:foo] = {}

This creates the tagged thread-local variable foo. How simple! This is something that I never expected to see. Never! So why are these Ruby guys having so much trouble with thread-safety? I have no idea.

With the tools I've seen in Ruby, there's really no excuse for why there aren't more thread-safe libraries. All the tools are there - they're just unused. Lazy coders.

Really Nasty Data Archeological Dig

Wednesday, August 29th, 2012

Detective.jpg

I know it needed to be done, and I know someone had to do it, but that doesn't make it any more fun than it already isn't. Digging in the data to find out why we aren't matching up merchants and demand in Philadelphia is no fun at all. It's a lot of data with very little pattern to it, and a whole lot of problems. But that's what I was doing for several hours today. The pain and suffering was really compounded by the complete lack of real thought put into this as we headed into the meeting.

Overall, I was very angry at myself for not pushing back. I should have. I know that now, but it's that blasted work ethic thing that causes me to say "Yes" when I should be saying "Hold on a sec…"

The problem is that we're getting demand and merchants to fulfill that demand, and the assumed "match" here should be very high. Why? Just because. They really have never looked at this and have no idea what it should be, but "instinctually" many think that it should be "very high" - like 90%. So when the first runs came out with it being more like 50%, they wanted to know why. I totally agree.

But where we diverge is in the How?

Once program manager suggested I send a 5000+ line Excel file where the hierarchical JSON data was somehow magically "flattened" to make it easy for anyone to look at the data and determine why the merchants weren't matching. Thankfully, I had the strength of character to say "No" to that.

But that wasn't until after I heard another request to log all 5000+ merchants against all 1500+ demands - yielding more than 8 million log lines. Nope. That's just plain silly.

I wanted to get to the bottom of this to be sure, but I wanted to do it in a way that makes at least a little sense. And looking at 8 million log lines isn't it. So I started building a few CouchDB temporary views and started looking for what wasn't being matched and why.

Turns out there were two major issues: the demand wasn't supplying sufficient 'service' coverage to pin enough merchants, and the zip codes on the merchant data was really pretty horrible. Call me 'Indy' on this - it only took me about 90 mins to find these reasons and document them up for the group. Nice. Clean. Efficient.

Nothing like looking at 8 million log lines.

Interesting JVM Helper

Tuesday, August 28th, 2012

java-logo-thumb.png

My manager at The Shop forwarded something he read today about an interesting little package called drip. It's essentially something that will pre-launch a JVM instance for a set of command options so that repeated calls of the same command will not have the overhead of starting a JVM. This would be ideal for JRuby - if it was supported. Unfortunately, this is a bash script and you need to be able to hook it into the code you're using - or at least replace the java command with drip.

Sadly, JRuby hides the java command, so we can't easily replace it. The JRuby team will have to make it possible with some kind of environment variable, etc. Given that Java on most platforms I've used starts pretty nicely, I'm guessing they are not going to spend a lot of time with this. It's really a bad problem on Mac OS X, but maybe that will be changing in the future. Who knows?

But it's certainly something to hang onto and maybe it'll be useful in the future.