Archive for the ‘Coding’ Category

Really Nasty Data Archeological Dig

Wednesday, August 29th, 2012

Detective.jpg

I know it needed to be done, and I know someone had to do it, but that doesn't make it any more fun than it already isn't. Digging in the data to find out why we aren't matching up merchants and demand in Philadelphia is no fun at all. It's a lot of data with very little pattern to it, and a whole lot of problems. But that's what I was doing for several hours today. The pain and suffering was really compounded by the complete lack of real thought put into this as we headed into the meeting.

Overall, I was very angry at myself for not pushing back. I should have. I know that now, but it's that blasted work ethic thing that causes me to say "Yes" when I should be saying "Hold on a sec…"

The problem is that we're getting demand and merchants to fulfill that demand, and the assumed "match" here should be very high. Why? Just because. They really have never looked at this and have no idea what it should be, but "instinctually" many think that it should be "very high" - like 90%. So when the first runs came out with it being more like 50%, they wanted to know why. I totally agree.

But where we diverge is in the How?

Once program manager suggested I send a 5000+ line Excel file where the hierarchical JSON data was somehow magically "flattened" to make it easy for anyone to look at the data and determine why the merchants weren't matching. Thankfully, I had the strength of character to say "No" to that.

But that wasn't until after I heard another request to log all 5000+ merchants against all 1500+ demands - yielding more than 8 million log lines. Nope. That's just plain silly.

I wanted to get to the bottom of this to be sure, but I wanted to do it in a way that makes at least a little sense. And looking at 8 million log lines isn't it. So I started building a few CouchDB temporary views and started looking for what wasn't being matched and why.

Turns out there were two major issues: the demand wasn't supplying sufficient 'service' coverage to pin enough merchants, and the zip codes on the merchant data was really pretty horrible. Call me 'Indy' on this - it only took me about 90 mins to find these reasons and document them up for the group. Nice. Clean. Efficient.

Nothing like looking at 8 million log lines.

Interesting JVM Helper

Tuesday, August 28th, 2012

java-logo-thumb.png

My manager at The Shop forwarded something he read today about an interesting little package called drip. It's essentially something that will pre-launch a JVM instance for a set of command options so that repeated calls of the same command will not have the overhead of starting a JVM. This would be ideal for JRuby - if it was supported. Unfortunately, this is a bash script and you need to be able to hook it into the code you're using - or at least replace the java command with drip.

Sadly, JRuby hides the java command, so we can't easily replace it. The JRuby team will have to make it possible with some kind of environment variable, etc. Given that Java on most platforms I've used starts pretty nicely, I'm guessing they are not going to spend a lot of time with this. It's really a bad problem on Mac OS X, but maybe that will be changing in the future. Who knows?

But it's certainly something to hang onto and maybe it'll be useful in the future.

Google Chrome dev 23.0.1246.0 is Out

Tuesday, August 28th, 2012

This morning I noticed that Google Chrome dev 23.0.1246.0 was out, and the release notes are back to being a little more descriptive. This guy a new V8 javascript engine as well as a new cut of WebKit and addresses a few other bugs. Nice to see the notes are back to what they used to be, and that we're still seeing progress. Nice.

Pretty Impressed with jQuery

Monday, August 27th, 2012

JQuery Framework

I've been building a little visualization based on ZingChart, and using jQuery as the general-purpose support library which I've come to realize is simply amazing, and deserving of the acceptance it's received. The code to do a simple AJAX request is amazingly simple:

  $.getJSON(url, function(data) {
    parse_series(data);
    redraw();
  });

where the code in the block is an anonymous function that runs on the return data which will be called data in the block. Very slick. I remember what I had to do to get this working in Chrome a few years ago… it was not pretty.

But there's so much more as well. I think I'm a convert. If I end up doing more javascript pages, I'm going to use jQuery and even if I end up using Google Visualizations, I'll use jQuery to get the data and do some of the nastier stuff.

Working with ZingChart

Monday, August 27th, 2012

WebDevel.jpg

Today I've started working with a new javascript charting library called ZingChart. It looks pretty complete, and certainly has a lot of charting styles, plus it's touting it's speed with large data sets, so that's nice. But as with anything like this, the learning curve is steep because the graph has to be configured in javascript, and it's always a lot of work to know not only How to do something, but also What's possible to be done.

Compounding this is that the documentation and the examples have contradictory information - I mean not even close! I spent several hours on trying to just get a nice pie chart going. But in the end, all the knobs are there and we don't have to worry that there's something we simply cannot do.

As an interesting note, they include the complete jQuery library in their 'resources' directory. It's something I've read a lot about, and it was available when I was doing my last web development work a few lifetimes ago, but we didn't use it because the Google Visualization Toolkit was more than enough for what we needed, and the need just didn't arise.

But it's interesting that it's all there. Very nice looking graphs.

Odd that they don't have a table. Guess I'll have to fall back to Google Visualizations for that.

Really Hitting the Wall on this LKit Feature

Saturday, August 25th, 2012

LKit Language

I'm trying to implement user-defined functions in LKit. and I'm having a really hard time with it. I'm not at all sure how to handle things with this addition - especially since I want to have recursion and pass-by-value work as you'd expect in a lisp-based language. The parsing of the code isn't the problem - that I've got figured out. But when I compile the code into an evaluation-tree, it's going to point to variables - some of which are defined outside the user-defined function, and some of them are the arguments to the function.

If they are the arguments, then we can't really have a static evaluation-tree… I'd have to have some kind of dynamic evaluation of the arguments for each invocation. It's getting to be a lot harder than I'd expected.

Now it's true that this was something that I didn't even attempt in my previous version of the code, but I was really hoping for something more this time. I'd even thought that I could bang out this function definition code today. Not so fast, it seems.

I need to handle the question of a calling stack. Really. I need to be able to "dive into" the evaluation of a function and then return to where I was. This is really a different layer than what I'm doing now, and it's going to take some significant time to think about it.

Well… maybe my next vacation.

Updating my WordPress CodeHighlighterPlus to GeSHi 1.0.8.11

Saturday, August 25th, 2012

wordpress.gif

I was looking for something to do as a good excuse not to solve this problem in LKit today, so I dug into the latest news on GeSHi and the version I'm running on currently is 1.0.8.6, but 1.0.8.11 is out, and the language count is now above 200. Nice. Not that I usually write in more than 200 languages, but the odds that APEX code, for instance is supported goes up as the number increases.

Anyway, the first thing was to download the GeSHi package and place it into my repo for CodeHighlighterPlus. It's basically the root of the repo, but I hadn't realized that in the last time I updated GeSHi.

The only thing I needed to watch out for were those few edits that I made to geshi.php the last time, and those were pretty easily isolated. I'd repeat them here, but the fact is with the repo on GitHub, you can just go there and get everything you need. Simple.

The new language support is:

4cs            dot            lscript        pycon
6502acme       e              lsl2           pys60
6502kickass    ecmascript     lua            python
6502tasm       eiffel         m68k           q
68000devpac    email          magiksf        qbasic
abap           epc            make           rails
actionscript   erlang         mapbasic       rebol
actionscript3  euphoria       matlab         reg
ada            f1             mirc           rexx
algol68        falcon         mmix           robots
apache         fo             modula2        rpmspec
applescript    fortran        modula3        rsplus
apt_sources    freebasic      mpasm          ruby
arm            freeswitch     mxml           sas
asm            fsharp         mysql          scala
asp            gambas         nagios         scheme
asymptote      gdb            netrexx        scilab
autoconf       genero         newlisp        sdlbasic
autohotkey     genie          nsis           smalltalk
autoit         gettext        oberon2        smarty
avisynth       glsl           objc           spark
awk            gml            objeck         sparql
bascomavr      gnuplot        ocaml-brief    sql
bash           go             ocaml          stonescript
basic4gl       groovy         octave         systemverilog
bf             gwbasic        oobas          tcl
bibtex         haskell        oorexx         teraterm
blitzbasic     haxe           oracle11       text
bnf            hicest         oracle8        thinbasic
boo            hq9plus        oxygene        tsql
c              html4strict    oz             typoscript
c_loadrunner   html5          parasail       unicon
c_mac          icon           parigp         upc
caddcl         idl            pascal         urbi
cadlisp        ini            pcre           uscript
cfdg           inno           per            vala
cfm            intercal       perl           vb
chaiscript     io             perl6          vbnet
cil            j              pf             vedit
clojure        java           php-brief      verilog
cmake          java5          php            vhdl
cobol          javascript     pic16          vim
coffeescript   jquery         pike           visualfoxpro
cpp-qt         kixtart        pixelbender    visualprolog
cpp            klonec         pli            whitespace
csharp         klonecpp       plsql          whois
css            latex          postgresql     winbatch
cuesheet       lb             povray         xbasic
d              ldif           powerbuilder   xml
dcl            lisp           powershell     xorg_conf
dcpu16         llvm           proftpd        xpp
dcs            locobasic      progress       yaml
delphi         logtalk        prolog         z80
diff           lolcode        properties     zxbasic
div            lotusformulas  providex
dos            lotusscript    purebasic

Then it's time to check it all into GitHub and then pull it down on the servers and see how it goes.

After the pulls were done, all was well, and things are looking very nice. Success!

Good Intentions and Real Development

Friday, August 24th, 2012

cubeLifeView.gif

We're in the final hours for a big demo with the Top Brass, and I'm trying to get things done, but I go to check on a run being done on the UAT box, and I find out that someone has started another copy! Now I know he didn't mean to mess me over, but he did. And I know he didn't mean to trash 30 mins of my work, but he did.

It's all that:

The road to Hell is paved with good intentions - Proverb

I know what it's like to be doing the best you can. I really do. I remember being on a Class A softball team, and I was lucky to get hits and play catcher. I was clearly outclassed. But this is a job, not a recreation. This is where you're supposed to be good - not just want to be good.

When I asked him if he'd started running things, he was honest and upfront, but said "Doesn't it check against that?" Of course not! Why would it? Well… now I have the answer to that question - I needed to write it for guys like him.

There was another exchange I had with someone… I had someone show me the following javascript:

  function(doc) {
    var filter = function(doc) {
      return doc.meta.label == "QuantumLead.results" &&
             doc.otcs.length > 0 &&
             doc.merchant.category != null &&
             doc.merchant.sales_value > 0;
    };
 
    var key = function(doc) {
      return [
        doc.meta.execution_tag,
        doc.meta.division,
        doc.merchant.category,
        doc.merchant.sales_value
      ];
    };
 
    var value = function(doc) {
      return {
        name:        doc.merchant.name,
        sf_id:       doc.merchant.sf_id,
        sales_value: doc.merchant.sales_value
      };
    };
 
    if (filter(doc)) {
      emit(key(doc), value(doc));
    }
  }

I asked him why he chose to write it that way. Just what the motivation was for the specific structure. His response was that this was about clarity and maintenance. To me, it seems awfully complex for something that I'd have written as:

  function(doc) {
    if (doc.meta.label == "QuantumLead.results" &&
        doc.otcs.length > 0 &&
        doc.merchant.category != null &&
        doc.merchant.sales_value > 0) {
      var key = [ doc.meta.execution_tag,
                  doc.meta.division,
                  doc.merchant.category,
                  doc.merchant.sales_value ];
      var value = { name:        doc.merchant.name,
                    sf_id:       doc.merchant.sf_id,
                    sales_value: doc.merchant.sales_value };
      emit(key, value);
    };
  }

When I asked him if he really thought that his was clearer than mine, he said "Yup", and so I let it drop. After all, there's no reason to make a big deal over this. But again, this is not what I'd call a good format, but hey… I'm trying to be more flexible and I'm no code enforcer here.

I know they mean well… they really do. But it's stuff like this that is exactly why, in the past, I've stepped up and simply pushed folks like this aside.

I'm trying to be better.

Google Chrome dev 23.0.1243.2 is Out

Friday, August 24th, 2012

Google Chrome

This morning the Google Chrome Team upped the major version number to 23.0.1243.2 with a pretty decent set of release notes. The inclusion of the V8 3.13.1.0 javascript engine, and updating WebKit to 537.6 are both really nice.

I was wondering if the least-significant updates of late meant that we'd be seeing this, but I really thought they'd just hit a plateau. Guess not, and that's great.

I can say that the refresh of a page is amazing. Very nice, and this release just keeps moving the bar up. It's nice to see forward progress on Chrome.

CouchRest Bug – Using a Proxy to Get to CouchDB

Thursday, August 23rd, 2012

CouchDB

Because we have several data centers - including boxes at EC2, the standard set-up at The Shop is to have several proxies - forwarding all traffic through a port to a datacenter. So all traffic to the eastern EC2 center leaves my laptop on port 1234 (for instance). This means that when I want to use the CouchRest ruby client for CouchDB, I need to do something like this:

  require 'couchrest'
 
  CouchRest.proxy('http://localhost:1234/') if use_proxy?
  @db = CouchRest.database('http://whackamole.east:5984/hammer')

to connect to the hammer database on the whackamole.east server in the east EC2 datacenter. Not hard, but there seems to be a catch in there somewhere.

For when I try this, I get about 2000 of 4000 documents saved and then I get this problem. A nasty stack trace that appears to be about some timeout, and then an attempted reconnection. What's very interesting is that it has to be in the proxy handling code of CouchRest.

Why?

Because when I do a simple port forwarding on my box to the database server on the other box, and use that port - thus bypassing the need for the proxy setting, everything works. Also, if I run it on a box in the east EC2 datacenter so that I don't need a proxy, then everything works.

I need to dig into the proxy code. I'm not exactly sure what I'll find, but it's gotta be there.