Archive for July, 2013

Letting Go – Regardless of Consequence

Wednesday, July 31st, 2013

cubeLifeView.gif

I like what I do - I really do. I like the company I work for - there are a lot of nice folks here, and I generally like the decisions that management makes. But as with every life a little rain must fall, there are times that your time in a group is done, and it's best to move on. The ideas that shaped the group and got it to this point were necessary and good, but now it's time to let someone else take over and take it from here.

Of course, that's not how it feels.

It feels like the new folks to the business think they have a monopoly on the project even though they just joined the company. It feels like they have no respect for the ideas the project was built on, so that their changes to the codebase make no sense, and in fact are counter to the goals that the project was built on.

It feels like they are being jerks.

And who knows… maybe they are. Maybe they aren't. It's not only impossible to tell, it's also completely unimportant. You find yourself in the minority and it's time to move on. No anger, no grief… maybe a bit of sadness for what's been lost, but loss is part of life. You can't allow the project to be what the new blood wants it to be - sees it to be in their minds, if you're there holding them back.

It's also not really fair to just sit in the group and allow the changes to occur around you. That's just gold bricking. Yeah, you know the code, yeah, you like the project, but it's all going in a different direction and it's time to just cut the cord. Allow the project to be what it will be under their stewardship.

It's time for me to move out of this group. As much as I'd like to keep working on what I'm doing, it's not good for the group or me.

Installing Hadoop on OS X

Wednesday, July 17th, 2013

Hadoop

This morning I finally got Hadoop installed on my work laptop, and I wanted to write it all down so that I could repeat this when necessary. As I found out, it's not at all like installing CouchDB which is about as simple as anything could be. No… Hadoop is a far more difficult beast, and I guess I can understand why, but still, it'd be nice to have a simple Homebrew install that set it up in single-node mode and started everything with Launch Control, but that's a wish, not a necessity.

So let's get into it. First, make sure that you have the SSH daemon running on your box. This is controlled in System Preferences -> Sharing -> Remote Login - make sure it's checked, save this, and it should be running just fine. Make sure you can ssh into your box - if necessary, make the SSH keys and put them in your ~/.ssh directory.

Next, you certainly need to install Homebrew, and once that's all going, you need to install the basic Hadoop package:

$ brew install hadoop

at this point, you will need to edit a few of the config files, and make a few directories. Let's start by making the directories. These will be the locations for the actual Hadoop data, the Map/Reduce data, and the NameNode data. I picked to place these next to the Homebrew install of Hadoop so that it's all in one place:

  $ cd /usr/local/Cellar/hadoop
  $ mkdir data
  $ cd data
  $ mkdir dfs
  $ mkdir mapred
  $ mkdir nn

At this point we can go to the directory with the configuration files and update them:

  $ cd /usr/local/Cellar/hadoop/1.1.2/libexec/conf

The first update is to handle a Kerberos bug in Hadoop - a known bug. Do this by editing hadoop-env.sh to include:

  export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb.kdc="

Next, edit the hdfs-site.xml file to include the following:

  <configuration>
    <property>
      <name>dfs.data.dir</name>
      <value>/usr/local/Cellar/hadoop/data/dfs</value>
    </property>
    <property>
      <name>dfs.name.dir</name>
      <value>/usr/local/Cellar/hadoop/data/nn</value>
    </property>
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
    <property>
      <name>dfs.webhdfs.enabled</name>
      <value>true</value>
    </property>
  </configuration>

Next, edit the core-site.xml file to include the following:

  <configuration>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/tmp/hdfs-${user.name}</value>
    </property>
    <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
      <description>The name of the default file system.  A URI whose
      scheme and authority determine the FileSystem implementation.  The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class.  The uri's authority is used to
      determine the host, port, etc. for a filesystem.</description>
    </property>
  </configuration>

Finally, edit the mapred-site.xml file to include the following:

  <configuration>
    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
      <description>The host and port that the MapReduce job tracker runs
      at.  If "local", then jobs are run in-process as a single map
      and reduce task.</description>
    </property>
    <property>
      <name>mapred.local.dir</name>
      <value>/usr/local/Cellar/hadoop/data/mapred/</value>
    </property>
  </configuration>

We are finally all configured. At this point, you need to initialize the Name node:

  $ hadoop namenode -format

and then you can start all the necessary processes on the box:

  $ start-all.sh

At this point, you will be able to hit the endpoints:

and using the WebHDFS REST endpoint, you can use any standard REST client to submit files, delete files, make directories, and generally manipulate the filesystem as needed.

This was interesting, and digging around for what was needed was non-trivial, but it was well worth it. I'll now be able to run my code against the PostgreSQL and Hadoop installs on my box.

Sweet!