Archive for the ‘Coding’ Category

Great JSON Beautifier as a BBEdit Text Filters

Friday, July 27th, 2012

BBEdit.jpg

This morning I was looking at some JSON output from a service and realized that my hand-cleaning of JSON was really not a good use of my time. So I googled JSON beautification BBEdit and found this:

  #!/usr/bin/python
  import fileinput
  import json
  if __name__ == "__main__":
    jsonStr = ''
    for a_line in fileinput.input():
      jsonStr = jsonStr + ' ' + a_line.strip()
    jsonObj = json.loads(jsonStr)
    print json.dumps(jsonObj, sort_keys=True, indent=2)

It's a little python script that I can put into ~/Library/Application Support/BBEdit/Text Filters/, call it PrettyJSON.py and then restart BBEdit and get a wonderful reformatter in Text -> Apply Text Filter -> PrettyJSON.

It's impressive. Fast too. I had a pretty big JSON file and it made it look nice and readable in under a second. Very impressive. This is certainly something to keep around.

Getting Ruby 1.8.7 Installed on RVM

Tuesday, July 24th, 2012

RVM - Ruby's Manager

Today I had a need to install ruby 1.8.7 to work in a repo here at The Shop. Normally, rvm would make this easy by allowing me to simply load it up:

  $ rvm install ree-1.8.7-2010.02

but the upshot of this is a problem in the compilation:

  Error running './installer -a /Users/me/.rvm/rubies/
  ree-1.8.7-2010.02 --no-tcmalloc --dont-install-useful-gems -c',
  please read /Users/me/.rvm/log/ree-1.8.7-2010.02/install.log

  There has been an error trying to run the see installer,
  Halting the installation.

After a ton of trying things, the solution is to use Homebrew's gcc 4.2 as it's not based on the LLVM, as the one that ships with Xcode 4.2 is. It turns out that there's a problem with the compile of some of the ree-1.8.7-2010.02 components when using the LLVM-based compiler.

OK, so we need to get the see what we're trying to get:

  $ brew search gcc
  homebrew/dupes/apple-gcc42
  (among other things)

If the homebrew/dupes doesn't appear, then you don't need this step, but I did:

  $ brew tap homebrew/dupes

and then we can install GCC 4.2 without the LLVM:

  $ brew tap apple-gcc42

This places gcc-4.2 in the path if you have Homebrew set up properly.

All this was to be able to properly compile ree-1.8.7-2010.02, so now we can do that:

  $ CC=gcc-4.2 rvm install ree-1.8.7-2010.02 --with-gcc=gcc-4.2

At this point, it will all install nicely and you'll have ree-1.8.7-2010.02 in your RVM set-up. Lots of work, but all worth it.

Ruby’s module – Categories from ObjC

Monday, July 23rd, 2012

Ruby

Today I did quite a bit of work on refactoring the existing code which represents a simple buffering storage container for the output of the pipeline in our project. The original version had a lot of things in it that had no real need - the output was ordered (why?), there were a lot of methods and nested classes that really didn't need to be there. So Jeff and I started peeling the layers back and cleaning up a lot of this code.

What we came up with was a very nice design. The store had two basic methods, and two optional ones, if it happened to maintain a state:

  class Store
    attr_accessor :backend
 
    def store(block)
      self.backend.store(block)
    end
 
    def bulk_store(blocks)
      self.backend.bulk_store(blocks)
    end
 
    def clear
      self.backend.clear if self.backend.respond_to?(:clear)
    end
 
    def flush
      self.backend.flush if self.backend.respond_to?(:flush)
    end
  end

With this, we can have a very simple store class and then use it as the "backend" to the main store that just happens to be a Singleton (not shown):

  class StreamStore
    attr_accessor :stream
 
    def initialize(stream = $stdout)
      self.stream = stream
    end
 
    def store(block)
      write_one(block)
    end
 
    def bulk_store(blocks)
      blocks.each do |b|
        write_one(b)
      end
    end
 
    private
 
    def write_one(b)
      stream.puts b.to_s
    end
  end

All this is nice, but it doesn't address the buffering, and that's where I've come to find one of Ruby's really nice features: Modules. You can write a module - like a class, and then you can augment an existing instance with this behavior without having to include this in the class inheritance.

Methods will be added, ivars as well - it's just like the categories in ObjC - but maybe even one step better: in Ruby you can add this to a single instance, and in ObjC you have to add it to the class everywhere in the runtime.

I'm really impressed. This is really powerful, but at the same time, it's possible to have one instance of an object have entirely different methods and ivars than another. One might think this is really cool, but from a production stand-point, it's also very dangerous. This kind of power can be abused so easily.

Struggling With Incomplete or Bad Unit Tests

Friday, July 20th, 2012

Unit Testing

This afternoon it's been tough as Jeff and I have been doing some pair work on one component of the system we're building at The Shop. It's a small ruby app, and the Ruby Way is to have lots and tests, but the problem is it's impossible to have complete test coverage, and bad tests are a nightmare. For example, we're trying to make the code functional in nature - immutable objects, queues and threads to balance the load, and that kind of stuff. All good goals, to be sure, and worthy of the effort. But as I've always said, there's no such thing as a complete test suite, and the ones I've seen are just complex enough to make the addition of a simple, single feature so daunting that it almost defies inclusion.

There was a Java Swing app, for instance, that every time I added a new attribute to the system I had to spend more than 5x the time of adding the feature to updating the tests. This is not bad, if it's only 10 sec to add a feature, but when it's an hour to add a feature, and 5 hours of work to update the tests, it gets out of hand. And since there's no system that analyzes the code and generates the test code, the tests are just as fallible as the code itself.

After all, who's writing the tests?

Do we need tests on the tests? Meta-Tests?

It can get out of hand very soon. And if it's just a simple set of sanity checks, then that's one thing, but when it includes end-to-end tests and complex integration tests, it's going to get out of hand very quickly.

Such is the case, I fear, for the project I'm on now.

Today we were trying to figure out why the continuous-integration server was failing on the tests. All the tests ran just fine on our dev machines, but on the CI server, they failed. All the time. Why?

Jeff and I started digging into this and found that the CI server was running the rspec tests in a specific order - as opposed to letting rspec run them as it saw fit out of the complete directory. We suspected the order of the tests was the problem, and sure enough it was. This is clearly hysteresis, or test pollution at work. Something in one of these tests was setting state that wasn't getting cleared properly, and then another test was coming in and it was failing. Reverse the order and both tests worked just fine.

So what was it?

We spent about an hour at this - getting it down to about 10 lines of code split across two methods. This is the flip-side of ruby… 10 lines is a method, but 10 lines is an hour long mystery. Finally, I realized that one test was capturing stdout and the other was using it, and if the non-capture went first, then it's singleton was set up for non-capture on stdout, and the test failed. Reverse them, and all was well.

Singletons. Not great for functional coding because of just this kind of stuff. Also, spec should have some way of resetting the complete environment for each spec (test) file. That would be far preferable, but I can see why they do it this way - write good code, and you don't have this problem, but it allows you to have tests that "build" on one another. Makes sense.

So tests - not the code itself, was the reason for a half day of work. This is nasty. Very nasty in my book. Maybe I'm still decompressing from the world of finance, but that's a long time to be stuck on something that adds no value to the project. Still… I have to be patient and learn why they do things this way, as there's bound to be some reason. Gotta be.

Learning the Ruby Way – It’s Different

Thursday, July 19th, 2012

Ruby

I've been working with a co-worker on some code for the project I'm on, and I'm learning a lot about the Ruby way of doing things. It's a lot different from the typical Java/C++ coding I've been used to over the years. In short - things are meant to be simple, but there can be a lot of them. So if you have a class that holds Requests, and queries them and loads them, that's really three classes in Ruby. You want to isolate the roles and responsibilities to one per thing.

This is very different from C++ where the overhead of a class is 50 lines, and then there's the implementation file as well. It'd end up being more boilerplate than functional code if it was written with the guidelines of methods less than 10 lines and classes less than a page. It's just an entirely different world.

And I'm enjoying learning this process and style very much. I can see the goals, and the reasons for it. It's pretty simple, really. But it's hard to do in a lot of other languages. Hard because of the overhead or boilerplate needed for those languages. Ruby is nice in that regard, and I'm excited about learning more.

It's a process, and I'm happy to be on the path.

Switched to Bash as My Login Shell

Wednesday, July 18th, 2012

Ubuntu Tux

I've been using tcsh for years - back to my first MacBook when it was the default for new accounts on Mac OS X. It's been a long time. But there are just too many things that are looking for bash as the shell, and I'm just tired of having to fix things, or re-do things, or just have to roll it all myself, so I switched.

This makes a lot of sense as it's the same shell I'm going to be using on Ubuntu, and all the other platforms I'll ever use. It was a pain to get going as I needed to convert my login scripts to bash, and that took a little bit of fiddling, but it's now worth it. I'm converted, and things are running smoothly again.

Great. Now I can get to running Ruby from RVM. Very important for the work I'm doing.

Moving from JRuby Installer to RVM

Tuesday, July 17th, 2012

Ruby

Turns out The Shop is a big fan of RVM, and it's ilk, where you have to install something small, and then have that install all the things you need. It's a nice setup, and I can see why they do it - so I'm going to remove the JRuby Framework and try using RVM.

The setup is simple:

  $ curl -L https://get.rvm.io | bash -s stable --ruby

and once that's all done, we can just install jruby with:

  $ source .rvm/scripts/rvm
  $ rvm install jruby-1.6.7

to get jruby 1.6.7 (the preferred version at this time) for the box.

It all installs in !/.rvm and that's really very convenient, and you can put simple .rvmrc files in project directories to force the use of a certain version, etc. Also, if that version isn't installed, then rvm will go get it for you to make sure you can run what you need.

Not bad. I can go with this.

Added Multiple Expression Support to LKit

Saturday, July 14th, 2012

LKit Language

I wanted to be able to add in the support for the user-defined functions, but the first step in that was to allow for multiple expressions, and evaluate them all, in order, but return only the last value. This appears to be pretty easy, and the basics weren't too bad, but I had a nasty little logic problem that was leading to another double-free in the data structures I was using.

What I essential did was to take the single value for an expression and make it an STL vector of pointers. So this:

  value         *_expr;

as an ivar on parser, became:

  expr_list_t   _expr;

and then I needed to really reformulate things a bit, as I had a list of sub-expressions and a mutex for that, but it made more sense to have both lists controlled by the one mutex:

  /**
   * These are all the expressions and sub-expressions that this
   * parser knows about - in two simple lists. These are created
   * in the parsing of the source, but really don't hold much as
   * all the values, constants, and functions are owned by the
   * parser.
   */
  expr_list_t                       _expr;
  expr_list_t                       _subs;
  // ...and a simple spinlock to control access to it
  mutable boost::detail::spinlock   _expr_mutex;

then I had to update all the uses of the old _expr in the code. Not horrible, but there were a few things to keep straight.

Also, then I needed to make methods to add to the expression list, remove from it, and then clear it out, but that last one made more sense to clear both of the lists out:

  /**
   * This method will remove ALL the known expressions from the
   * list for this parser.
   */
  void parser::clearExpr( bool includeSubExpr )
  {
    spinlock::scoped_lock       lock(_expr_mutex);
    /**
     * First, delete all the subexpressions in the list. We need
     * to get rid of them all in order not to leak.
     */
    if (includeSubExpr) {
      for (expr_list_t::iterator it = _subs.begin(); it != _subs.end(); ++it) {
        if (*it != NULL) {
          delete (*it);
        }
      }
      // now we can clear out the list as everything is deleted
      _subs.clear();
    }
 
    /**
     * Next, delete the top-level expressions as they too, need to
     * go as we don't want to leak.
     */
    for (expr_list_t::iterator it = _expr.begin(); it != _expr.end(); ++it) {
      if (*it != NULL) {
        delete (*it);
      }
    }
    // now we can clear out the list as everything is deleted
    _expr.clear();
  }

where the method's definition had the default value set to:

    virtual void clearExpr( bool includeSubExpr = true );

so that by default, we'd clear out everything, but if called by a subclass, we could clear out just the high-level expressions and leave those sub-expressions that might be used in other places as-is for the duration of the parser.

And all seemed to be going pretty well after I made the change to the compile() method to really loop through all the possible expressions in the source. Specifically, we went from:

  bool parser::compile()
  {
    bool      error = false;
    // make sure they don't change the source while we work...
    spinlock::scoped_lock       lock(_src_mutex);
    // see if we need to compile anything at all
    if (_expr == NULL) {
      uint32_t     pos = _src.find('(');
      if (pos < _src.length()) {
        if ((_expr = parseExpr(_src, pos)) == NULL) {
          error = true;
        }
      } else {
        // no start of an expression - bad news
        error = true;
      }
    }
    return !error;
  }

to:

  bool parser::compile()
  {
    bool      error = false;
    // make sure they don't change the source while we work...
    spinlock::scoped_lock       lock(_src_mutex);
    // see if we need to compile anything at all
    if (!isCompiled()) {
      value       *e = NULL;
      char        c = '\0';
      uint32_t    len = _src.length();
      for (uint32_t pos = 0; pos < len; ++pos) {
        // skip all white space - it's unimportant
        if (isspace((c = _src[pos]))) {
          continue;
        }
        // see if we have another expression to process
        if (c == '(') {
          if ((e = parseExpr(_src, pos)) == NULL) {
            // can't parse it, drop everything and bail
            error = true;
            clearExpr();
            break;
          } else {
            /**
             * Looks good, so add it to the list - but ONLY
             * if it's not a variable definition. Then it's
             * already accounted for in the variable list.
             */
            if (e->isExpression()) {
              addExpr((expression *)e);
            }
            // correct for the fact we're on the right spot
            --pos;
          }
        }
      }
    }
    return !error;
  }

One of the real tricks I had to put into this code was the test for the result of parseExpr() to be an expression. After all, if it's a variable, then it's already going to be in the list of variables, and deleting it from the variables list and then the expression list is going to get you a double-free every time.

With this slight change, and a few other things to make it nicer to use, I was able to handle multiple expressions in the code, and therefore have a series of variable definitions (or looking to the future, function definitions) in the code and then have a single evaluation step that would return the one value we were really looking for.

Fun, and painful at times, these pointer gymnastics are nasty, but they have to be done in order to take advantage of the polymorphism necessary for the design.

Added Variable Definition and Assignment to LKit

Friday, July 13th, 2012

LKit Language

Today I slugged through a bunch of code to get the variable definition and assignment into LKit. It was a lot of grief, but in the end the tools I'm using (GCC, GDB) are more than up to the task and I got things working just the way I wanted.

The idea was that I wanted to be able to define and update variables in the language with the simple syntax:

	(set x 6)
	(set y (+ 1 2 3))

where the first just assigns the value of 6 to x, and the second assigns the expression (+ 1 2 3) to y. The second is by far the more interesting in that it's a way to define a complex expression to a variable and it'll be evaluated once, and after that, it'll use that value over and over in the calculations. At least that's the idea.

In order to get it working I had to do a little messing around with the definition of the variable. Initially, we had a simple subclass relationship between the value and the variable, where the variable just added in the concept of a name. But what I really needed was to be able to have any value in a variable, and in order to do that, I needed to be able to hold onto a pointer to a value, and that was a bit tricky.

What I needed to do was to add in a pointer to a value in the variable and then look to see if it's set in the eval() method. If so, then call eval() on that pointer's contents as opposed to the super class's eval() method. Then, because I had this pointer, I had to manage the memory on it, and that meant that I needed to make a few more constructors to make everything fit in as it should.

In the end, I had a pointer double-free problem, and GDB helped me find it by pointing out where it was occurring, and from that, I was able to infer what was going on. Standard debugging, but I will certainly admit that it had me fooled for a while, until I was able to figure out what the sequence of events were that lead to the logic problem.

Created Parser for LKit

Thursday, July 12th, 2012

LKit Language

Today I finished up the basic language parser for LKit. It's nothing fancy, but it does handle all the language components that I wanted to get into the first real cut of the code. It's nicely documented in the README.md on GitHub, and I was very happy to get this level of work done today. It was something I really wanted done before I started my new job, and thankfully I got it all in and documented before the end of the day.

There is still a lot to do. I need to finish the variable assignment syntax parser, but that's going to be fairly straightforward. I also need to build the parser for the user-defined functions, and that's going to be a little trickier as I will need to make some additional classes based on the function class that are the user defined functions. It's all there, I think, I just need to sit down and work it through - making sure it's all going to work out.

After that, I really need to think about adding a list as a potential data type in the value object. After that, I'm going to need to do the same for a time series. I know this because I can see the need for the list, and the point of doing all this was for some guys I talked to that needed a language for processing their time series data. So it's got to go in.

After those additions, there will no doubt be a lot of functions to work on the lists and the time series. Not sure about all of them, but given that I know a few, I should build them as examples to others looking at the code.

It's getting closer… still lots to do, but getting much closer to something really useful.