Creating Really Dense Code – The Ruby Way
Friday, August 31st, 2012This afternoon I've written some of the most compact, potentially confusing code I've written in many, many years -- and it's perfect code by a Ruby developer. This is something that may be specific to The Shop, but given that they are such a big Ruby shop, I'm guessing that this is the Ruby Way, and like a lot of the functional code I've seen - completely undocumented. Now that's not to say my code is undocumented, in fact, it's got almost a 1:1 ration of comments to code because of it's compactness, but I've come to realize there's a bit of a blind spot in a solid group of young ruby coders that looks a lot like what I call Homework Problems.
In any case, the code I wrote today was specified by the quantitative analyst in Palo Alto as this:
Group the merchants by the services they offer so that any one merchant in the group shares at least one service with at least one other merchant.
Logically, this means that we can have a series of seemingly unrelated services so long as the group has these pair-wise matchings with at least one service.
If we look at the group of Macy's, the Gap, and a Movie Theatre:
You'd think there's no way the movie theater fits in the same "group" as the Gap, but because Macy's sells Jeans, and so goes the Gap, and because a movie theatre sells candy, and so does Macy's, then the Gap and a Movie Theatre "belong together" in a group.
I'm not making up these rules, I'm just trying to code them up.
Once we get these groups of merchants, we'll then process them and get some data from them. That's not the interesting part. The interesting part is the grouping, and how to get it.
My first idea was to write a few little methods that I knew I was going to need: one to get the services from a merchant into an array, and another to see if there are any overlap (set intersection) between two merchants:
# this method returns a clean, unique set of services for the provided OTC. def self.get_services(otc) (otc['taxonomy'] || []).map { |i| i['service'] }.compact.uniq end # this method returns true if ANY service is shared between the two OTCs. ANY. def self.services_overlap?(otc_a, otc_b) !(get_services(otc_a) & get_services(otc_b)).empty? end
At this point, I knew that these were very "ruby-esque" methods - one line each, so it's got to be "minimal", right? At the same time, I was able to then start to deal with the idea of just finding the right pairs to feed to the second method, and then collecting them into the right groups.
But therein was a real problem. If I just looked at the merchants serially, then the order matters. Imagine the order: Gap, Movies, Macy's. In this case, the Movies would not match the Gap, so there'd be two groups, and then Macy's would match the Gap, and strand the Movies. Bad. So I had to have multiple passes, or I had to think up some other way of looking for the sets.
What happened was that I was scanning the ruby Array docs and noticed the product() method. Interesting, and after about another 10 mins of trying to think up a solution, the ideas came to me: use product() to make pairs of merchants to check, and then add things in and remove duplicates.
Sweet idea!
def self.group_by_service(otcs) # start with the array of groups that we'll be returning to the caller. groups = [] # look at all non-identical pairings in the original list and for each # pairing, see if there are ANY common services. If there are, try to find # a group to place the PAIR in, if we can't, then make a new group of this # pair. otcs.product(otcs).map do |pair| next if pair[0] == pair[1] if services_overlap?(pair[0], pair[1]) groups << pair unless (groups.map do |g| (g << pair).flatten!.uniq!.size unless (g & pair).empty? end.compact.reduce(:+) || 0) > 0 end end # verify that each OTC is in some kind of group - even alone otcs.each do |d| groups << [d] unless (groups.map { |g| g.include?(d) ? 1 : 0 }.reduce(:+) || 0) > 0 end # return the array of groups to the caller groups end
It's like an expanded APL to me. Compact code. Chained method calls. More work by the CPU, but less code written by the person. It's not something I'd traditionally write because it's excessively wasteful in the work it's doing, but I'm guessing that it'll be seen as evidence of me "getting it" by the other Ruby guys in the group.
I get it, and in certain instances, I don't think it's wrong. But in a production app that's going to hit speed limitations, have code like this is a killer to performance. There's too much that doesn't need to be done. Yeah, it looks nice, but it's going to put a tax on the machines that shouldn't have to be paid.
I get it… I'm just not sure I think it's a great thing.