Archive for the ‘Coding’ Category

Finally Figured Out New Relic and JRuby

Monday, November 5th, 2012

JRuby

This afternoon, I decided to dig into the New Relic Ruby API and see if I couldn't figure out the problem that's been bugging me for a while. Basically, it's all about putting New Relic in a jruby-1.7.0 jar and getting the instrumentation to load.

I knew that New Relic had their code on GitHub, so I went and forked the code in hopes of fixing the problem.

I started digging in the first place I've come to suspect in jruby-1.7.0 - Dir.glob.

Sure enough, they had code that was using this in much the same way that the Configulations gem was: it was scanning directories, in this case, their own code, and loading the scripts it found. These scripts, it turns out, are their "instrumentation" directory.

Makes perfect sense.

The call to Dir.glob returned the empty array, so there was no error. But there was also nothing loaded. Simple. I changed the code from:

  def load_instrumentation_files pattern
    Dir.glob(pattern) do |file|
      begin
        log.debug "Processing instrumentation file '#{file}'"
        require file.to_s

to:

  def load_instrumentation_files pattern
    Dir.glob(pattern.gsub(/^jar:/, '')) do |file|
      begin
        log.debug "Processing instrumentation file '#{file}'"
        require file.to_s

and, as expected, everything started to work just fine!

I sent in a pull request to the New Relic folks so that they could fix their code, and I also sent them an email, as I was back talking to their support guys.

Got a nice note back from one of the support guys thanking me for the fix. I know jruby will get it fixed eventually, but this is really an issue that needs fixing sooner than later.

Maybe I'll fork that and fix it? 🙂

Just a Housekeeping Morning

Monday, November 5th, 2012

Code Clean Up

This morning has been a leisurely morning of cleaning up a few nagging issues, making sure things are in place, and organizing things for when I have the next big push. I'm expecting machines in the new datacenter any day now, and there will be work there to get things tested and moved over. Then we're also expecting more data from another group that will allow us to boost the relative value of merchants based on their previous history with us, and that's going to be quite a bit of work. Still, there's the new demand system that needs to be started and built up from scratch.

There's a lot to do, but this morning it was all with someone else, so I spent a little time getting things cleaned up and in order so that when one of these hits, I don't have to scramble to clean things up and get the new work done.

Testing NewRelic with JRuby

Sunday, November 4th, 2012

I've had a few questions about what's happening at The Shop with New Relic and JRuby 1.7.0, so today I decided to take a little time and really prove to myself what's up and what's broken with the two. My initial tests were with jruby-1.7.0, but then I backed off and started from what I remember worked - jruby-1.6.7.2. Because I needed to see this interact with New Relic, I thought that it'd be easy to just sign up for a free New Relic account.

This is really kinda neat. I have liked what New Relic can do, and it seems there's a pretty decent free system going on there, and it makes it easy to try things out. So I signed up, got things going, and started getting some results:

So it looks like the jruby-1.6.7.2 both as a script and as a self-executable jar work, and then on the same graph, I was able to get jruby-1.7.0 as a script. So far, so good.

The last test was to run this under jruby-1.7.0 as a self-executable jar:

With these tests out of the way, I know that I can run New Relic with jruby-1.7.0 -- I just need to make sure it's getting started and that the Net::http instrumentation is being loaded at the right time. I have high hopes for the runs that will kick off in a few hours!

[11/5] UPDATE: OK… bad news. Yes, for my simple tests, New Relic worked fine from a jruby-1.7.0 jar. However, if there is a more complex project - with dozens of classes, then the instrumentation isn't loaded, and we have a problem. I've sent another request into New Relic, and we'll have to see what they say. If I could manually load the instrumentation, I'd be OK, but I don't know of a way to do that.

[11/5 3:00pm] UPDATE: Found the problem! It's the old JRuby Dir.glob issue and I fixed the New Relic Ruby API and sent them a pull request with the fix. I also sent them an email, so I'm hoping they take it to heart, but we have the fix we need, and that's the main issue.

Finally Solved a Nagging NewRelic Problem

Friday, November 2nd, 2012

JRuby

Ever since we've moved to JRuby 1.7.0, we've had a problem with our New Relic stats, and it's really been quite annoying. New Relic makes all these nice graphs of the instrumentation you place in your apps, and then makes it dead simple easy to show that and drill down to any performance issues you may be having.

In the last few weeks we've been battling quite a few Salesforce performance issues, and without the New Relic graphs, it was essentially impossible to know the details of the calls we were making to Salesforce. This was nasty. I knew it was all about JRuby 1.7.0, but why? Thankfully, the New Relic support guys gave me a good idea.

Basically, it boiled down to my app wasn't loading the net/http instrumentation at the right time. It needed to be loaded very early in the cycle, but it was supposed to be automatically loaded, and it clearly wasn't. I had made DEBUG loggings - nothing seemed to help. Then I looked at the code again:

  require 'ext/rpm/lib/new_relic/agent/error_collector'
  require 'net/http'
  require 'newrelic_rpm'

and I started to wonder: What if there's a problem with that agent? So I changed the code to read:

  require 'net/http'
  require 'ext/rpm/lib/new_relic/agent/error_collector'
  require 'newrelic_rpm'

BINGO! Works!

I'm so glad! This has been a real annoyance, and now I have a work-around. Don't really care if it's something I have to manually do, I'm just glad it's working.

Fixing More Code Monkey Bugs

Friday, November 2nd, 2012

bug.gif

This morning I came in and saw that we were having serious Production problems - and they were in the code, and it was effecting UAT as well. Now I'm not a prude… I get that code goes into Production with bugs - heck, I've done it, and I'll likely do it again in the next month. But I'm here to fix it, and to the best of my abilities, I test the code before I go home with a quick run. But last night was ugly.

I guess I wasn't really surprised that I had problems this morning… I was having problems last night getting a test to run. The Code Monkeys on the team have been fiddling with a lot of things - specifically the Salesforce endpoints, and while I think that's great, it's a little frustrating when there are folks trying to get things done with Salesforce, and others trying to refactor and hope it's not breaking anything.

But that was last night, and I thought I had it all behind me. But evidently not. In the refactoring, they have employed the Hashie gem, and while it's nice, it's also a little tricky. They wanted to base their conversion of our "container based objects" from a Hash to something based on the Hashie::Trash which is a transforming Hash. However, that, in turn, it turns out, is based on the Hashie::Dash - or defined Hash.

This means that if you haven't defined the possible entries for the Dash, then you can't add it. That's nice, to keep "junk" out, but if you miss one in the code, then all of a sudden, working code becomes very uncorking code all because of this restriction. I think they thought they planned for that, but they left off a critical component. A very necessary subclassed method:

  def assert_property_exists!(property)
    unless self.class.property?(property)
      Thrash.property(property.to_s)
    end
  end

This code simply makes sure that if the new property isn't there, it's created. This isn't what the Hashie::Dash is all about, but it's what we needed because we're constantly adding attributes to the Merchant, and we want to avail ourselves of the mapping in the Hashie::Trash, but not the restrictions of the Hashie::Dash.

What's annoying about all this is that I'm the one finding this out.

I'm not the one championing the Hashie gem. I'm not the one working it into the code. I'm not the one writing tests for any of this. But I'm the one that picks up the pieces and fixes it and re-runs production and UAT to make sure we have something for the day.

It's hard being this guy when I'm working with a group of Code Monkeys that are as careless as they are. If I'm finding my own bugs, then that's on me. My fault - totally. But when I'm finding things that I would have checked - and they should have checked, and it's causing the project to look bad, then it's annoying. Too much of this, and I'm really getting frosted.

Unfortunately, I'm not their manager. I'm not their boss. So I can do nothing about it. Not a single, bloody, thing.

Disagreeing with Code Monkeys

Thursday, November 1st, 2012

attitude.jpg

Today I was working on a few stories that have to do with a rule system we're putting into the app, and one of the Code Monkeys rolled on over and started to discuss his opinion of what we should do with regards to the design of the rule system. Now bear in mind… we have something, and he's talking about completely changing the flow and logic of that to make it, in his words: "Simpler to understand". Specifically, the existing rules that we coded up were based on a simple doc the program manager wrote up. The current problem was that he had unspoken assumptions in the rules and so he filed several bugs on the rules after they were coded up.

Normally, this would be OK, but I wanted to use this as an educational opportunity for this PM, as I really thought it warranted it. If he can't express the rules completely in english, we're going to have a very hard time coding them up to his satisfaction. So there was plenty of back-and-forth, give-and-take in this conversation.

We talked about how he might organize his thoughts… how me might represent them on paper, all with the goal of being able to get to a point where we had a solid set of rules that we could take and code against.

In the middle of this, over rolled the Code Monkey.

Now I'm only saying that in retrospect. This guy is smart. He's nice. And he can write good code when he chooses, it's that he often finds it's more fun to play games or just not code. In any case, he came over with what he thinks we should be doing.

To frame this properly, we currently have a system where there are several condition-type-methods on the classes, and these are then used in combination in methods to create each "rule". For example, if we wanted to have a rule that said "New sales reps get 10 days to content a merchant or the merchant is reassigned" we might write up:

  def check_new_merchant(account)
    if !account.new_sales_rep? && account.no_activity_since_assignment?
      reason = "No activity on merchant"
    end
    reason
  end

so if we get a nil, then we know it didn't "trip" the rule.

Now there can be tests that say "leave this merchant alone" as well as tests that say "re-assign this merchant", so we have to put them in a block - in a specific order, to make sure that we don't build overly complex rules. Sure, it's non-trivial, but it's a workable system in the same sense that a switch or case statement is. You have to run through it from top to bottom as order matters.

The proposed solution was to have a series of classes - all based on a root rule class. This class would look something like this:

  class rule
    def applies?(account)
    end
 
    def condition?(account)
    end
  end

where the applies? method is used to see if this rule applies to the Merchant account, and the condition? method is used only if the merchant returns true from applies?.

What's really needed is a tri-state output of the rule: Yes, No, and Don't Care. But short of that, the first method filters out the "Don't Care" states, and then the second handles the other two. Not bad, but it could be better if we had a logical value that could handle this in the language. Ah… in C++ this would be easy…

There's nothing wrong with the approach, but it doesn't really reduce the complexity. If you have a list of these classes, then the order is just as important - unless you make sure that they are, by construction, mutually exclusive. This means that whereas the existing scheme tests one thing after the other, this scheme would require that we include the logic of all tests in each test so that we didn't have a mistaken triggering.

You could make it all work in ruby by adding functionality and methods, but then the runtime becomes very hard to debug, but hey, that's ruby, and that's how you'd do it. But that doesn't really reduce the complexity. It just shifts it.

And there came the disagreement.

My point was that there's really no difference in complexity - just a shifting. But the advantage of the method we have now, that I have been working with, is that we have a fail fast approach. If we have 1000 merchants to test, and 900 of them will fail on the first test, the current approach won't run through all the other tests for those 900 - so it'll be faster.

His point was that he'd trade that for what he believed was a "simpler" system.

Not really bad - yet, but we're getting there fast.

When I got him to agree that it was just a shift, and that there was a performance bonus, his complete unwillingness to "agree to disagree" was the problem. Totally unwilling to agree that I had a point, and that it was equally valid, he simply argued over and over again, until I'd gotten to the point that I was about to say something very unkind. Instead, I said: "Fine. Do it your way", and looked at the PM and said "Talk to him now".

This wasn't OK with the PM. Why? Because I'm the reason things are working. I'm the reason things are delivering. I'm the one that Gets Things Done. He was know for coming in at 10:00 am, and playing a lot of ping pong. So it was clear to him that he wanted me on his issues, and so the conversation was not "a difference of opinions", it was a Code Monkey vs. a Coder, and he didn't like the outcome at all.

Too bad. I'm not the guy's manager. I'm not his boss. I can't make him do anything, and he's already gone into my code after the fact and rewritten parts in the name of "clarification" that gutted things I begged him not to remove. Too bad. I can't stop him, and I'm not going to get into a war of tit-for-tat in git. No way.

So it's one of those classic problems, and I think I have an idea about how to keep it from happening next time: Agree with his point, but say "I've got this - you get the next one" and don't let him dig in his heels. Might not work, but it might, and that's worth a shot.

More Salesforce APEX Code

Thursday, November 1st, 2012

Salesforce.com

Based on a few rules we're trying to implement in the system, I needed to dive back into the Salesforce APEX code for our endpoint(s) and add a few things that weren't there now. Specifically, I needed to take one more step to getting the two endpoints that deal with Merchants a little bit closer together. This wasn't all that easy as APEX is like Java, but not, and testing the code is not easy, and even editing the code isn't all that easy.

Salesforce.com is a web site. So to edit the classes, you need to copy it out of a text field, paste it into some local editor, do the changes, copy that out and paste it back into the text field (assuming it hasn't timed out), and then save it. To test it you need to go to a different web page and put in your URL, with parameters, and then see what results you get.

It's workable, but its nowhere near "efficient". I know from a friend that works there that things are changing, and that would be great. Something to make the development process with Salesforce easier would go a long way to making the entire experience, and therefore the impression about Salesforce, a lot better.

Still… it's possible to write code, and test it. But in a group, it means that you have to be very good with communication as you can easily step over people's changes if you aren't careful. And if you have someone changing the mapping code while you're trying to change what you get out of the API, then it's going to get ugly.

And it did.

Very.

It was a pretty stressful time, but in the end, I got what I needed done. Just not very happy with the overall experience. Certainly something to be avoided in the future, if at all possible.

Working with Bash is Really Quite Fun

Thursday, November 1st, 2012

Ubuntu Tux

I know the Code Monkeys hate it, but bash is a really great little tool. It's on every box, it's got all kinds of capabilities, and it's simple to use. Now it's a slightly different syntax than Ruby, so I'm sure that has something to do with it's distaste with the kids, but that's their issue, not mine.

This morning I was having a little fun creating summary scripts for the two other processes we run as part of the nightly batch runs. They are also ruby processes, but the other guys haven't taken the time -- OK, who am I kidding? I wrote the only summary script there is, and I just hadn't gotten around to writing them for the other two processes.

One other teammate talked about doing it, but I know he'd have written it in ruby, and then tossed out what I had, and I wasn't about to let that happen. So I took the existing code and simply added parsers for the three files, and allowed the script to tell - based on the file name or a "forcing argument", which of the three parsers to use.

Pretty nice. Very clean, and it's one simple file.

It's probably not 100% complete at this point, but it's a great start, and it's able to point out when things have gone really wrong.

Doing timing comparisons for Salesforce

Wednesday, October 31st, 2012

Salesforce.com

This morning I need to do some simple timing comparisons for the Salesforce Team here at The Shop so they can push this back to the Salesforce.com guys and ask what's going on. I'm taking the overnight runs of Production and UAT hitting the production and staging sandboxes, respectively, and comparing that to running the UAT hardware against the new API sandbox, and simply comparing times for actual runs.

Hardware Sandbox Time
Production (EC2) Production 20:03
UAT (EC2) Staging 20:03
UAT (EC2) API 29:07

At the same time, I'm gathering some DEBUG logging information for NewRelic as their newrelic-rpm 3.5.0.1 Gem is not sending back to them the method call instrumentation data in jruby-1.7.0 that it is in jruby-1.6.7.2. I know this because our NewRelic graphs go silent the same instant we moved from 1.6.7.2 to 1.7.0. However, we do get the CPU usage and the memory usage and even the deployment markers - so I know we're sending something, but it's just not sending the right instrumentation data.

So they asked me to send them a DEBUG log, which seems really reasonable. So I'm getting it as I'm doing the final UAT-to-API timing test.

UPDATE: they have sped up API significantly from yesterday. Whatever they did - it's working and that's OK with me.

Google Chrome dev 24.0.1312.1 is Out

Wednesday, October 31st, 2012

This morning I noticed that Google Chrome dev 24.0.1213.1 was out and the big change appears to be in the new WebKit (537.17) and a bunch of fixes related to the bookmark syncing and stability issues. I have to say that I really like the new look of the blog as it's much cleaner than the old, and it's also nice they are moving along with WebKit updates.

Good news.