Archive for the ‘Coding’ Category

Rewriting Bash to Ruby (cont.)

Thursday, November 29th, 2012

Ruby

This morning I was able to finish up the re-write of the summary script and I was very pleased with the results: the processing of the pipeline log dropped from 4+ min to less than 5 sec - even with jruby, and the other two are in the sub-2 sec range. The latter are really dominated by the jruby startup time, and if we can move to an RMI ruby in deployment, that will help here too.

In short - fantastic success! Now I need to come up with a better queueing and processing scheme in bash - or re-write that in ruby as well…

Great File Encoding Tip for Ruby

Thursday, November 29th, 2012

Ruby

This morning I ran into a problem with the ruby re-write of the summary script that I've been working on since late yesterday. The error was occurring on the relatively simple code:

  File.open(src) do |line|
    if line =~ / BEGIN /
    # …
    end
  end

right in the open() method call. The error was cryptic:

  summary:48 in 'block in process_pipeline' invalid byte sequence in UTF-8 (ArgumentError)
      from summary:47" in 'each'

I had to hit google, as it was clear to me there were odd characters in the file, and while I might like to fix that - the key to the previous version was to include the '-a' option to grep to make sure that it looked at the files as binary files. But what would do the trick here?

Turns out there's a StackOverflow answer for that:

  File.open(src, 'r:iso-8859-1') do |line|
    if line =~ / BEGIN /
    # …
    end
  end

which instructs the IO object to read the file with the ISO-8859-1 encoding and that did the trick. No other changes were necessary!

Sweet trick to know.

Google Chrome dev 25.0.1337.0 is Out

Thursday, November 29th, 2012

Google Chrome

It's almost the leet version number, but not quite. This morning Google Chrome dev 25.0.1337.0 was released with virtually nonexistent release notes. I guess the same maintainer is back at it. Just gotta wonder what they are thinking. I can write a few paragraphs a day on what I'm doing and they can even be bothered to make decent release notes?

Googlers. Go figure.

Rewriting Bash to Ruby

Wednesday, November 28th, 2012

Ruby

With all the efficiency changes in the code recently, the next most inefficient was the bash script that analyzed the run logs and generated some summary statistics for us to view in the morning. When I first created this script, it wasn't all that complex, and the logs weren't nearly as big as they are now. I used the typical assortment of scripting tools: grep, sed, awk, but the problem was that as I added things to the summary script the time it took to execute was getting longer and longer. To the point that it took several minutes to run it on the main pipeline process. That's no good.

So I wanted to rewrite it in something that was going to be fast, but process the file only once. The problem with the current version isn't that it's using bash, or grep, it's that the files are hundreds of megabytes and we need to scan them a dozen or more times for the data. What we needed was to make a single-pass summary script, and that's not happening with bash and grep.

So what then?

Ruby popped to mind, but given that we're using jruby, there's a significant startup penalty. But maybe we can force it to use a compiled MRI ruby in the deployment environments, and that will speed up the loading.

C, C++ both seemed like ideal candidates, but then I know how the rest of the guys in the group would react, and it's just not worth it.

So Ruby is is.

This shouldn't take long, as most of this is pretty simple stuff for ruby. Let's get going...

Moving Day has Arrived!

Wednesday, November 28th, 2012

Building Great Code

Finally, moving day has arrived! This morning I've been getting things moved over to the new servers in our own datacenter, and this should provide a very needed boost to the performance of the application. This includes a CouchDB server with 24 cores and 96GB of RAM with a 1.6TB disk array, as well as a nice app server with 24 cores and 96 GB of RAM. There's a mirrored Couch pair for production, and a similar app server there.

It's been a lot of little things, lots of little code changes and pushes. Even some reconfiguring of aliases in the firewalls, but that's where I'm starting to hit a snag. I used to be able to do this, now it's meant to be handled by the production operations group. That's not too bad, but they won't push anything until 4:00 pm today, and if it's not 100% right, then we're going to have a hard time getting it right for tomorrow.

I'm hoping to get a few more tests done today, but I doubt that I'll be able to simply because a co-worker is busy using UAT to test things there. It's a shared environment, and there's no way to run both tests at once, so since he was first, I have to wait.

I'm not the most patient of people.

Getting New Hardware Ready to Go

Tuesday, November 27th, 2012

servers.jpg

This afternoon I've been working with a co-worker to get all the new hardware up and going in our own datacenter so that we can move our application from Amazon's EC2 to our own, more reliable, machines. It's a bit of a hassle in that there are now 22 new machines to rebuild, and the folks doing it aren't paying really close attention to the machine names and set-ups, so there have been a lot of re-dos, but it's getting there.

We should be able to get all the critical machines up and going before I have to leave today, and then I can get started on moving the apps in the morning.

Exciting times to be getting out of EC2, and onto far far better hardware. I'm just hoping that it's going to clear up the issues we've been having with Couch. Now that would be really nice!

Code Cleanup

Tuesday, November 27th, 2012

Code Clean Up

Today has been a lot of little things to try and get the application's performance good enough so that it can still run in EC2 for the few days that it has left in that datacenter. I'm trying to put in simple, clean fixes to minimize the time spent in an overall run so that we can get more divisions out in the same period of time.

This brings up the point that's been bugging me for a few days, and that's expectations. I'm really getting tired of making extraordinary effort for some management folks that really don't seem to recognize the nature of the effort, or appreciate what it is that I'm really doing.

It's nothing I haven't seen before, but it's always a little sad the first time you see it at a new job. That realization that this guy is no batter than that other guy at the previous place, and they are going to push and make artificial deadlines and then pre tent to "tell Dad" if you don't meet them.

Working last week on Wednesday, Thursday, and Friday to make a deadline that I didn't think was possible, just to make it possible for this guy to tell his superiors that his team "did it" was something I was willing to do - as long as it was appreciated. But it wasn't. So now this guy has marginalized himself. I won't break my back to get him out of his own jam any more.

But hey… what am I doing now, then? I'm trying to make this work as opposed to just letting it fail.

I'm a chump.

Tracking Down Problem with Salesforce Data

Tuesday, November 27th, 2012

Salesforce.com

This morning I was tracking down a bug that was reported by our project manager related to the prioritization phase. This particular sales rep wasn't getting a good call list, and I needed to dig into Why?

After I added a bunch of logging, I was able to see that it was all a data problem. The fields in Salesforce are often just strings, and this leads to not easily enumerable sets. It's not necessarily Salesforce's fault, it's the way in which it's used, and we seem to be having a little problem with consistency here. But be that as it may, it's still our problem and we need to figure out the proper way to get at these sales reps regardless of how they seem to be classified.

Sigh… these pseudo-business decisions are always the worst. They are made for "today", and change "tomorrow", and we're always going to be correcting for problems in the mappings.

Writing Effective Log Messages – It’s a Lost Art

Tuesday, November 27th, 2012

I know this may seem like an old man complaining about these young kids and how they aren't doing it right, but I have to say, it seems that the art of writing good, concise, effective, log messages is a lot art. I've been trying to debug a problem this morning and it's all cleared up when you introduce one decent log message, and elaborate a little on a few others. I mean really - the problem is clearly solved with a few minutes of work on writing effective log messages.

OK, so here's my list of rules for log messages - not that anyone cares:

  • Each log message has to stand alone - you can't assume that log messages will come in any order - certainly not with multi-threaded code, and that's just about the standard these days.
  • Each log message has to be useful - putting out a message saying "sending 5 to output" is not really useful. You can say more - like what they are, or why they are going out. If not, you're really only doing the log file equivalent of a "busy indicator", and that's not useful.
  • Each log message is human-readable - when you dig into log files, you need to be able to read them. There is a school of thought where the log files should be designed for easy scraping. I think the scraping is something done after you have good logs, and it's not all that hard. But listing key/value pairs just doesn't cut it.
  • Each log message contains the class and method where it occurs - there's so much to be gained by always knowing where the code is that wrote the log. Just do it.
  • Put in enough logging to know what's happening - disk space is cheap, so write out good log messages every step along the way of the processing. This is going to pay off over and over when you're tracking down problems.

This morning, I've been adding and augmenting to the log files in our code to get things up to the point that I can effectively debug a problem we're having. Had this already been done, the debugging would have been trivial because there's no bug! It's all a data problem, and that would have been easily seen with a little bit better logging.

Oh well… I guess that's going to be part of what I have to do in this group.

Move to CouchDB Server-Side Updates

Monday, November 26th, 2012

CouchDB

In a continuing effort to make the code more efficient and really, just plain faster, this afternoon I've been working with a teammate to update CouchRest, our ruby client to Couch, to handle server-side updates. Couch allows server-side updates - you basically write a javascript function that takes the document and the request and you can update the document as you see fit, and return something to the caller.

It's not bad, really. It should certainly make the updates a ton faster as right now we're reading, updating and writing back the complete document for a very small change - in one case just a single field. This is really where the document database falls down, and you long for a SQL statement where you can simply UPDATE and be done with it.

Still, it's nice to be able to write:

  function(doc, req) {
    var ans = false;
    var fld = 'lead_assignment';
    if (doc) {
      doc[fld] = JSON.parse(req.body);
      and = true;
    }
    return [doc, JSON.stringify({'updated': ans})];
  }

and be able to make a change with:

  def update_merchant_assignment(division, sf_id, stuff)
    return nil if (id = get_latest_results_docID(division, sf_id)).nil?
    Database.update('merchant_updater/add_assignment', :id => id, :body => stuff)
  end

It really simplifies the code, and it certainly cuts the bytes moved for an update way down. I'm hoping it's enough… we'll have to see how it goes.