Archive for the ‘Coding’ Category

Added Multiple Expression Support to LKit

Saturday, July 14th, 2012

LKit Language

I wanted to be able to add in the support for the user-defined functions, but the first step in that was to allow for multiple expressions, and evaluate them all, in order, but return only the last value. This appears to be pretty easy, and the basics weren't too bad, but I had a nasty little logic problem that was leading to another double-free in the data structures I was using.

What I essential did was to take the single value for an expression and make it an STL vector of pointers. So this:

  value         *_expr;

as an ivar on parser, became:

  expr_list_t   _expr;

and then I needed to really reformulate things a bit, as I had a list of sub-expressions and a mutex for that, but it made more sense to have both lists controlled by the one mutex:

  /**
   * These are all the expressions and sub-expressions that this
   * parser knows about - in two simple lists. These are created
   * in the parsing of the source, but really don't hold much as
   * all the values, constants, and functions are owned by the
   * parser.
   */
  expr_list_t                       _expr;
  expr_list_t                       _subs;
  // ...and a simple spinlock to control access to it
  mutable boost::detail::spinlock   _expr_mutex;

then I had to update all the uses of the old _expr in the code. Not horrible, but there were a few things to keep straight.

Also, then I needed to make methods to add to the expression list, remove from it, and then clear it out, but that last one made more sense to clear both of the lists out:

  /**
   * This method will remove ALL the known expressions from the
   * list for this parser.
   */
  void parser::clearExpr( bool includeSubExpr )
  {
    spinlock::scoped_lock       lock(_expr_mutex);
    /**
     * First, delete all the subexpressions in the list. We need
     * to get rid of them all in order not to leak.
     */
    if (includeSubExpr) {
      for (expr_list_t::iterator it = _subs.begin(); it != _subs.end(); ++it) {
        if (*it != NULL) {
          delete (*it);
        }
      }
      // now we can clear out the list as everything is deleted
      _subs.clear();
    }
 
    /**
     * Next, delete the top-level expressions as they too, need to
     * go as we don't want to leak.
     */
    for (expr_list_t::iterator it = _expr.begin(); it != _expr.end(); ++it) {
      if (*it != NULL) {
        delete (*it);
      }
    }
    // now we can clear out the list as everything is deleted
    _expr.clear();
  }

where the method's definition had the default value set to:

    virtual void clearExpr( bool includeSubExpr = true );

so that by default, we'd clear out everything, but if called by a subclass, we could clear out just the high-level expressions and leave those sub-expressions that might be used in other places as-is for the duration of the parser.

And all seemed to be going pretty well after I made the change to the compile() method to really loop through all the possible expressions in the source. Specifically, we went from:

  bool parser::compile()
  {
    bool      error = false;
    // make sure they don't change the source while we work...
    spinlock::scoped_lock       lock(_src_mutex);
    // see if we need to compile anything at all
    if (_expr == NULL) {
      uint32_t     pos = _src.find('(');
      if (pos < _src.length()) {
        if ((_expr = parseExpr(_src, pos)) == NULL) {
          error = true;
        }
      } else {
        // no start of an expression - bad news
        error = true;
      }
    }
    return !error;
  }

to:

  bool parser::compile()
  {
    bool      error = false;
    // make sure they don't change the source while we work...
    spinlock::scoped_lock       lock(_src_mutex);
    // see if we need to compile anything at all
    if (!isCompiled()) {
      value       *e = NULL;
      char        c = '\0';
      uint32_t    len = _src.length();
      for (uint32_t pos = 0; pos < len; ++pos) {
        // skip all white space - it's unimportant
        if (isspace((c = _src[pos]))) {
          continue;
        }
        // see if we have another expression to process
        if (c == '(') {
          if ((e = parseExpr(_src, pos)) == NULL) {
            // can't parse it, drop everything and bail
            error = true;
            clearExpr();
            break;
          } else {
            /**
             * Looks good, so add it to the list - but ONLY
             * if it's not a variable definition. Then it's
             * already accounted for in the variable list.
             */
            if (e->isExpression()) {
              addExpr((expression *)e);
            }
            // correct for the fact we're on the right spot
            --pos;
          }
        }
      }
    }
    return !error;
  }

One of the real tricks I had to put into this code was the test for the result of parseExpr() to be an expression. After all, if it's a variable, then it's already going to be in the list of variables, and deleting it from the variables list and then the expression list is going to get you a double-free every time.

With this slight change, and a few other things to make it nicer to use, I was able to handle multiple expressions in the code, and therefore have a series of variable definitions (or looking to the future, function definitions) in the code and then have a single evaluation step that would return the one value we were really looking for.

Fun, and painful at times, these pointer gymnastics are nasty, but they have to be done in order to take advantage of the polymorphism necessary for the design.

Added Variable Definition and Assignment to LKit

Friday, July 13th, 2012

LKit Language

Today I slugged through a bunch of code to get the variable definition and assignment into LKit. It was a lot of grief, but in the end the tools I'm using (GCC, GDB) are more than up to the task and I got things working just the way I wanted.

The idea was that I wanted to be able to define and update variables in the language with the simple syntax:

	(set x 6)
	(set y (+ 1 2 3))

where the first just assigns the value of 6 to x, and the second assigns the expression (+ 1 2 3) to y. The second is by far the more interesting in that it's a way to define a complex expression to a variable and it'll be evaluated once, and after that, it'll use that value over and over in the calculations. At least that's the idea.

In order to get it working I had to do a little messing around with the definition of the variable. Initially, we had a simple subclass relationship between the value and the variable, where the variable just added in the concept of a name. But what I really needed was to be able to have any value in a variable, and in order to do that, I needed to be able to hold onto a pointer to a value, and that was a bit tricky.

What I needed to do was to add in a pointer to a value in the variable and then look to see if it's set in the eval() method. If so, then call eval() on that pointer's contents as opposed to the super class's eval() method. Then, because I had this pointer, I had to manage the memory on it, and that meant that I needed to make a few more constructors to make everything fit in as it should.

In the end, I had a pointer double-free problem, and GDB helped me find it by pointing out where it was occurring, and from that, I was able to infer what was going on. Standard debugging, but I will certainly admit that it had me fooled for a while, until I was able to figure out what the sequence of events were that lead to the logic problem.

Created Parser for LKit

Thursday, July 12th, 2012

LKit Language

Today I finished up the basic language parser for LKit. It's nothing fancy, but it does handle all the language components that I wanted to get into the first real cut of the code. It's nicely documented in the README.md on GitHub, and I was very happy to get this level of work done today. It was something I really wanted done before I started my new job, and thankfully I got it all in and documented before the end of the day.

There is still a lot to do. I need to finish the variable assignment syntax parser, but that's going to be fairly straightforward. I also need to build the parser for the user-defined functions, and that's going to be a little trickier as I will need to make some additional classes based on the function class that are the user defined functions. It's all there, I think, I just need to sit down and work it through - making sure it's all going to work out.

After that, I really need to think about adding a list as a potential data type in the value object. After that, I'm going to need to do the same for a time series. I know this because I can see the need for the list, and the point of doing all this was for some guys I talked to that needed a language for processing their time series data. So it's got to go in.

After those additions, there will no doubt be a lot of functions to work on the lists and the time series. Not sure about all of them, but given that I know a few, I should build them as examples to others looking at the code.

It's getting closer… still lots to do, but getting much closer to something really useful.

Installing JRuby on My MacBook Pro

Wednesday, July 11th, 2012

JRuby

After talking to my soon-to-be manager on Monday, I learned that the initial code they are building is in JRuby, as there's a ton of Ruby code in The Shop, and yet JRuby allows you to drop down into the JVM should performance be an issue. I know next to nothing about Ruby or JRuby, so it made sense to install JRuby and see what's up. Turns out, it was pretty simple, if they had the installer package working a little better.

The first thing was to get the disk image from the JRuby site. There are several ways to download it, and I picked the dog as it had installers and uninstallers - should that prove to be necessary. In any case, it installed fine, but I could not find the jruby executable.

I looked in the installer's package contents, and it turns out that the entire thing is a simple Framework, and everything can point to a simple path and we're good to go. This process took me a few minutes, and could have been cleared up with a simple README in the disk image, but hey, what's a few minutes between friends?

In order to get the path correct, I took advantage of the /user/libexec/path_helper script and placed into /etc/paths.d/ the file jruby that looks like:

  /Library/Frameworks/JRuby.framework/Versions/Current/bin

and on the next shell start, we'll pick this up in the path. Now if the JRuby guys are smart, they will have been careful to make sure that when they call commands like irb, they are doing it within the Framework directory. And here's why…

The install script added a few lines to my .tcshrc, but it added Bash script. Easy fix, but they put the JRuby path at the end of the PATH. This means that when I use the /usr/libexec/path_helper to add in the JRuby path, I get:

  peabody{drbob}7: echo $PATH
  /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/git/bin:
  /Library/Frameworks/JRuby.framework/Versions/Current/bin:.:/Users/drbob/bin:
  /usr/local/qt/bin:/usr/local/xerces/bin:/usr/local/groovy/bin

and we can see that the path for JRuby is after /usr/bin where the Mac OS X install of rib is located. I'm going to leave it like this for a while in the hopes that they did it right (I would have), and I won't have to force the path addition to the front of the PATH. We'll see.

At this time, we can now check that JRuby is successfully installed:

  $ jruby -v
  jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98)
  (Java HotSpot(TM) 64-Bit Server VM 1.6.0_33) [darwin-x86_64-java]

Good enough. Now I need to start reading up on JRuby and see what I can see. It's going to be exciting!

Google Chrome dev 22.0.1202.1 is Out

Tuesday, July 10th, 2012

Google Chrome

The Google Chrome Team has once again bumped the major version number on Chrome, and we're now on 22.0.1201.0 with an incredibly short set of release notes. No matter, the V8 javascript engine is now 3.12.9.0, and I'm sure there are a batch of security and stability fixes for this guy as well. They have done one thing on the Mac client - the minimum OS X release is now 10.6 (Snow Leopard) and that's nice in that it means they can expect certain minimum from the OS that makes it easier/nicer to build on.

Keep up the nice work, guys.

Making Decent Progress on LKit

Friday, July 6th, 2012

LKit Language

This afternoon I've been doing quite a bit of coding and documenting on LKit. Specifically, I've put in quite a few examples of the language constructs to the README.md so that I had something to work against for the parser::compile() method. I wanted to have a few things straight before I started coding things up.

Secondly, I noticed that I only had max, min, and sum defined as functions, but what I really needed was to include all the arithmetic operators: +, -, *, and / as well as the comparison operations: ==, !=, >, <, >=, and <=. I built the comparison operators as a single class with a constructor argument (enum) for the type of operation to be performed, and then just added them to the default set with different names and constructor args. Nice and clean.

I did the same for the binary operators and, or, and not. Very clean, and it really keeps the lines of code down.

With this, I'm about to the point that I need to really get the tokenizing and parsing together. I still need to be able to have user definable functions, but that's a big next step, and I'll tackle that after I get the basic parser going with what I have.

Lots to do, but it's getting more exciting as there are lots of things to work on and get going. Closer and closer… I'll get there.

The Real Utility of Web-Based Language Tests

Thursday, July 5th, 2012

cubeLifeView.gif

It's common to hear, the company you are interviewing with wants you to take a test on the language you say you're proficient in - just to make sure that you are what you say you are. It sounds very reasonable: You say you've been doing C++ coding, then taking a little test for an hour should be no big deal, right? Java, Ruby, anything can be tested, right? And this will let the employer know if you're really good at this. Right?

Not so much.

I've been taking and writing tests for many years, and I've no problem with tests of any kind - so long as the results of the test are balanced against other factors. For example, are you going to hire an architect for your house based on his ability to take a 60 min test on the loading of certain structures? Or the standard lengths of commercially available lumber? No. You want to see what he's done. That's what's impressive (or not).

You want to look at what a person has done, and can do, in order to know if you want to hire them. It's possible that a one-hour test is a good measure for that, but I'm guessing not. I've taken too many of these tests in the past couple of years to think that any web-based test can be a good, complete, and accurate measure of a person's ability to write real-world systems.

How are you going to test their ability to design solid thread-safe code? You can ask them questions about it, but that's not the same. You can ask about immutability, but that's not the same as knowing when to apply it, and when not to.

The problem is that these tests are billing themselves as an authoritative measure of the quality of an individual, and they can't possibly be. They are far too narrow. How can you possibly measure a person in an hour? No way.

So I had to take two of these today, and it just wiped me out. I got "decent", but not "great" scores - as I'd have expected, and that's about as good as I can expect. But I guess I need to look at these tests in a slightly different light -- if the employer is taking these ay face-value, then maybe that's telling me something important about their hiring policies, and maybe that's the critical take-away for me.

In all my years, I'd never inflict this on someone. It's just not a useful metric for hiring a person. Going into a place that requires them means that I'll be forced to ask for them. Not necessarily a good thing, in my book.

UPDATE: I got a mid-40's percentile on the C++ test. From this, you'd think there are a ton of developers that are better than I am. As such, the employer that was asking me to take the test said I wasn't qualified. It hurts, I won't kid you, but in the end, I honestly believe it's saying more about them than me, and that's really what's important to me.

Finally Got LKit up to GitHub

Tuesday, July 3rd, 2012

LKit Language

I finally got enough of LKit done to push it all up to a new GitHub (public) repo, and it's nice to get past that first hurdle. I have the value, the function, and the expression all done, and even have a few basic functions defined - sum, max and min. While I know there's a lot more to do, I wanted to get a little something up there so that I can start to show it to folks, and get some important feedback.

I went onto Google, and found a neat little tux/ninja-turtle icon I'll use for LKit for free - can't beat that. And I'm off to the races.

What I've really got up to this point is the basics of the language - in structure, but no parser at this point. I need to start work on more functions as well as the parser, and then start to look at optimizations on the updating of values. For example, if all the arguments to an expression haven't changed, then there's no need to recalculate the expression - it's value will remain unchanged.

There's a lot more to do, But it's nice to get the first cut up there so I can get a sense that it's real.

So Much of C++ Coding is Making a New Language

Saturday, June 30th, 2012

Building Great Code

I'm working on my latest new project - a simple lisp-like language in C++ that is intended to be fast to execute and very memory efficient. In the past, I've built something like this, but it was more of an interpreter, and this time I'm heading for a more JIT-style system where I'm going to be creating in-memory data structures and then running them over and over as the values come in. This matches up more with what the use case is: data stream processing.

What I'm realizing that as I build the code in LKit, I'm really building a new language based on C++ and the objects and operators that I define in C++ for this problem domain. Think about it: C++ has only the basic data types of C, but it ships with the ability to deal with complex data - yeah, a + bi complex data.

You can create these as if they were simple data values, add, subtract, multiply them… even stream them out. For all intents and purposes, the shipping compiler really supports complex data types. But it's really all just built on top of what's there, and that's the real power of C++ to me - the ability to really augment the language to be able to add in data types and operations that make perfect sense in the domain, but aren't in the initial cut of the compiler.

I've create a value. This value can be a bool, an int, a double, or a time stamp. I can add these, compare them to one another, and to the data types they represent. They look, act, and process exactly like they were part of the language. That's really quite incredible. But it comes at a cost: you have to code it all up.

You have to code up the operations. You have to code up every little thing about how to handle these new data types, and if you don't, then it's not there. It's a lot of code to throw down, and I can see why a lot of people shy away from it, but it's got a lot of power, and there's reason to be careful with what you are doing.

But in defining this language, I really have a tremendous power. I get to define how all this works, and what it all does. I can make my resulting C++ code look incredibly simple and clean by making sure all the operators and functions are there to support what one would naturally want to do with these data types.

This makes the language almost domain-specific. And that's one of the things that makes coding in C++ so amazing to me. Great tools.

Starting Work on New Project: LKit – Simple Lisp-Like Language

Friday, June 29th, 2012

GeneralDev.jpg

Today I've spent a good bit of time working on a new project - a lisp-like language in C++ for processing scripts in as efficient manner as possible. The idea was really from some work I've done in the past as well as talking to some nice guys at a start-up that are looking to do the same kind of thing. What I was thinking was that I'd put something together, and point them to it so they could get an idea of how I'd do it. Then, they are free to use it or not, and I've done my part to help them out.

I interviewed with them, but I don't think I'm going to be getting an offer, or taking the offer, as it's likely to be lower than what I have from another place, and it's a small shop, and therefore, a significantly greater risk. But I want to help them, and this is what they were hoping I'd be doing if I joined the group.

The first steps were pulling up the ideas I had from the original code, but there were a lot of things about that code that I didn't like - specifically in the area of performance. I'd like this version to be as fast as possible as it's going to be working on very large data sets - typically time series data, and so the type of data is somewhat restricted, but the amount is enormous.

I got the first two classes built: value and then variable (a value with a name). Now I need to step back a bit and work on the rest of the component design: expressions, functions, etc. The question will be: Should I break out the evaluation methods from value into a different base class? I'm not sure if I need to do this, or if it's really going to benefit the design in any way.

Certainly, it'd be nice to have all the "evaluate-able" objects based off one super class, but I'm not sure that I need that level of flexibility. After all, a function takes a series of values as arguments, and produces a value. An expression is really a function and/or values in some relation to one another. It's not like I need to have this all that different from what it is now.

For instance, I can base the function on the value and have the result value put into the ivar for the value as a cached value to speed up the resulting calculations. That would be nice. Then expressions are something else… maybe they are just relations of values. Not sure.

In any case, I've got a start, and it's going to take a lot more work to get something ready to put on GitHub and show to the guys at the start-up. But I'll get there.