Archive for the ‘Coding’ Category

Creating an x86_64 RPM of Boost 1.43.0 for CentOS 5

Thursday, July 1st, 2010

Boost C++ Libraries

This morning I decided that it's better to make an RPM of Boost 1.43.0 and install it than to try and package this stuff on my own. So I started out on what looks to be a long and painful journey. In the end, I'll have this as a record of what to do correctly and in what order. I'll have thankfully edited out all the errors and re-dos to come up with something that really explains what I've done.

At least I hope it's going to turn out that way.

Laying the Foundation of RPM Building

First, let's figure out what we need to build an RPM. Pretty simple: a place to do it. So I created a build directory and set up everything for a build:

  $ cd ~/vendor
  $ mkdir rpm
  $ cd rpm
  $ mkdir BUILD RPMS SOURCES SPECS SRPMS
  $ cd RPMS
  $ mkdir athlon i386 i486 i586 i686 x86_64 noarch

Then in my home directory, I created the file .rpmmacros that contained:

  %_topdir /home/rbeaty/vendor/rpm

where clearly, the point of the .rpmmacros file is to tell RPM where it's supposed to look for things as it goes about it's business. This is a huge load off my mind as I now can build in peace and not have to worry about deploying until I'm 100% sure it's ready to go.

At this point, I need the sources.

Getting the Sources in the Right Place

One of the tricks I'm using is to base this build off the boost-1.33.1 SRPM spec that I can get from CentOS. So I need to get the source RPM for the older version from CentOS 5's site:

  $ cd ~
  $ wget http://mirror.anl.gov/pub/centos/5/os/SRPMS/boost-1.33.1-10.el5.src.rpm

and then I can "install it" into the building location with:

  $ rpm -i boost-1.33.1-10.el5.src.rpm

At this point, I don't need boost-1.33.1-10.el5.src.rpm any longer.

If I go into the $RPM_BUILD_ROOT as we'll call it in this process (and it happens to be called that in the scripts) we can then:

  $ cd ~/vendor/rpm/SOURCES

and we'll see the contents of the source tree. Not bad for a few simple steps! Now we need to get the latest version of boost and place it here as well. With a simple:

  $ wget http://sourceforge.net/projects/boost/files/boost/1.43.0/
         boost_1_43_0.tar.bz2/download

(all one line) we will get the latest version of the code into the SOURCES directory as: boost_1_43_0.tar.bz2. It turns out that the bzip version of the file is important as the spec file expects to see it and we don't want to disappoint (or unnecessarily modify) it.

Update the Spec File for Boost 1.43.0

I decided to go a "minimal impact" plan on the spec file. I needed to change a few things. First, the version of Boost that I'm building. The spec file is in the SPECS directory in the main rom directory. The simple differences are:

  1. %define tarball_name boost_1_33_1
  2.  
  3. Name: boost
  4. Summary: The Boost C++ Libraries
  5. Version: 1.33.1
  6. Release: 10%{?dist}

becomes:

  1. %define tarball_name boost_1_43_0
  2.  
  3. Name: boost
  4. Summary: The Boost C++ Libraries
  5. Version: 1.43.0
  6. Release: 1%{?dist}

and since I've built it already I knew I didn't need a library I didn't have on this box, so I could comment it out. If you don't need to do this, even better.

  1. BuildRequires: libicu-devel
  2. Obsoletes: boost-doc <= 1.29.2
  3. Obsoletes: boost-python <= 1.29.2

becomes:

  1. #BuildRequires: libicu-devel
  2. Obsoletes: boost-doc <= 1.42.0
  3. Obsoletes: boost-python <= 1.42.0

In the %package section, I needed to change what this obsoletes:

  1. %package devel
  2. Summary: The Boost C++ headers and development libraries
  3. Group: System Environment/Libraries
  4. Requires: boost = %{version}-%{release}
  5. Obsoletes: boost-python-devel <= 1.32.2
  6. Provides: boost-python-devel = %{version}-%{release}

becomes:

  1. %package devel
  2. Summary: The Boost C++ headers and development libraries
  3. Group: System Environment/Libraries
  4. Requires: boost = %{version}-%{release}
  5. Obsoletes: boost-python-devel <= 1.42.0
  6. Provides: boost-python-devel = %{version}-%{release}

Then, in the %prep section, I commented out the patches that I didn't think needed to be made as it was running fine for me as-is.

  1. %prep
  2. rm -rf $RPM_BUILD_ROOT
  3.  
  4. %setup -n %{tarball_name} -q
  5. %patch0 -p0
  6. %patch1 -p0
  7. %patch2 -p0
  8. %patch3 -p0
  9. %patch4 -p0
  10. %patch5 -p0
  11. %patch6 -p0
  12. %patch7 -p0
  13. %patch8 -p0

becomes:

  1. %prep
  2. rm -rf $RPM_BUILD_ROOT
  3.  
  4. %setup -n %{tarball_name} -q
  5. #%patch0 -p0
  6. #%patch1 -p0
  7. #%patch2 -p0
  8. #%patch3 -p0
  9. #%patch4 -p0
  10. #%patch5 -p0
  11. #%patch6 -p0
  12. #%patch7 -p0
  13. #%patch8 -p0

Finally, the %build section could be drastically simplified due to the new build tools in boost:

  1. %build
  2. BOOST_ROOT=`pwd`;
  3. # build jam
  4. ./bootstrap.sh
  5.  
  6. # build boost with bjam
  7. ./bjam

and then completely commented out the %check section as I have no need of running any tests on boost - it is what it is.

Building and Verifying

Now that I have the spec file modified, I can build the RPMs:

  $ cd ~/vendor/rpm/SPECS
  $ rpmbuild -bb boost.spec

when this is all said and done, you should have the RPMs in the RPMS directory in the main 'rpm' directory - organized by the machine architecture.

To verify the contents of the RPMs, simply go into the directory and have a look at one:

  $ cd ~/vendor/rpm/RPMS/x86_64
  $ rpm -q -p boost-devel-1.43.0-1.x86_64.rpm

What you'll see is all the files that are in the RPM and it's just what you need.

Going for the Whole Burrito

If I want to build the i386 RPMs as well, I need to do two things. First, I need to edit the spec file to tell Boost to build 32-bit and to put the libs in the right place:

  1. %define tarball_name boost_1_43_0
  2. %define _libdir %{_exec_prefix}/lib
  3.  
  4. Name: boost
  5. Summary: The Boost C++ Libraries
  6. Version: 1.43.0
  7. Release: 1%{?dist}

and:

  1. %build
  2. BOOST_ROOT=`pwd`;
  3. # build jam
  4. ./bootstrap.sh
  5.  
  6. # build boost with bjam as 32-bit
  7. ./bjam address-model=32

and then I need to build the rpm with a slightly different command:

  $ cd ~/vendor/rpm/SPECS
  $ rpmbuild -bb --target i386 boost.spec

Of course, now that I think about it, it'd be just as easy to have two spec files - one for 64-bit and the other for 32-bit and then have two commands that build their RPMs, etc. Not a horrible situation at all, really.

Getting on a Decent Version of Boost

Wednesday, June 30th, 2010

Boost C++ Libraries

In the past, I've gone the "roll your own" mode with C++ libraries, and there's a lot to like about the model. First, if it's in the C++ compiler (GCC), then it's free, automatic, and you use it - unless there's a compelling reason not to. Second, if there's an RPM for the system, use that as it's easy to place on every box and it's close to being in the compiler as it's been installed in the obvious locations for the compiler to pick up. Third, if you need to compile it from source, then do that, but it means that you either have to make a tarball to place it in /usr/local, or you have to manually install it on all the boxes, or you have to place it in the delivery package with your code. Finally, you can write your own.

But there are a lot of reasons to skip to the last step. I can remember doing performance tests on the STL vector - not impressive in speed. So I write a vector template class that's a ton faster because it's a simple array of instances. It's not as generic as the STL version, but for all the things I needed, it was far better. So there are reasons, sometimes.

But in this new rewrite I'm doing at The Shop, I decided to try to use as much Boost as possible. There are a ton of libraries in Boost. It's going to be in the next C++ standard, and that means it's going to be in GCC, so there's plenty of reasons to use it. I looked, and for CentOS 5, there are RPMs for Boost! What a break. Problem is, they are for version 1.33.1, which is several years old, and missing some very key features for me.

Further complicating the problem was how to approach the inclusion of Boost, should I choose to upgrade to the latest version for this project. What I wanted was an RPM of the i386 and x86_64 versions of the libraries and the headers. It's certainly possible to make one from the source, but that's a ton of work that just doesn't seem to be worth the effort. While it might be nice to build it and make it available to the world, as an option for folks in my shoes, it seems that having no experience in making an RPM puts me in the distinct disadvantage here.

Putting it in /usr/local means that there's something I need to put onto each machine, and if it's got the RPMs installed already, there's a real possibility that there could be some serious conflicts. Additionally, The Shop doesn't have a nice NFS mount point for all this open source code where it's organized by the project, the version, etc. and therefore available on all machines naturally.

I'm left with the icky realization that the easiest and safest method is to package it with the app. I really hate this, but there's really no other solid alternative.

So how do I get the latest Boost?

It's really a lot simpler than the Boost web site describes:

  cd /usr/local/src
  wget http://sourceforge.net/projects/boost/files/boost/1.43.0/boost_1_43_0.tar.gz
  tar zxvf boost_1_43_0.tar.gz
  cd boost_1_43_0
  ./bootstrap.sh
  ./bjam
  sudo ./bjam install

That's it. Now this will install it into /usr/local by default, and I didn't do that last step, but had I beed doing this on machines I really controlled, I probably would have. I just think putting a library with the project is just wasteful. Too many people might want to use it, and that's the point: Get a recent version on the box and use it.

I'm still toying with the idea of building an RPM, but I need to get this project going and then hassle with the libraries. The only difference would be in the Makefiles and the deployment strategy, and those can wait for now. It's built, it works, and I can write code.

Good enough.

I sure do wish they'd make the RPMs for a reasonably recent version available on a web site, though. That would be really nice.

The Value of Good Code Layout

Monday, June 28th, 2010

I've been trying to get a handle on what's in the current version of The Magic Schoolbus and it's hard. I mean it's a lot harder than it has to be. There are complete non-template implementations in header files, there are classes grouped - some logically, some not, into the same files... it's a mess. Trying to see what's happening - a real important thing in OO design, is next to impossible.

Many people have criticized me for making my code too verbose. Maybe it is. But every one of those people were never trying to understand it. They were trying to shy away from it as a coding standard, and simply write less. Hey, I understand lazy. It's easy to understand: You want to do less than you have to. Easy. But it's always going to cost you in the end.

Take this codebase... if they had taken the time to make header and implementation files for each class, then it'd be a lot easier to see what's happening. I wouldn't have a 277,000 line header file that's really a header with all the implementation in it. I'd have a good set of headers for use with a pre-compiled binary library and people would be able to use it like they should.

But that wasn't the path chosen.

I want to choose the better path. So I'm writing the entire thing over from scratch. Using Boost every single time I can to make it portable while not sacrificing capability and speed. I'm going to make this project something to be proud of, and I hope, I really hope that it catches on.

If not, I'm still going to do it.

That's just the way I roll. Baby.

Google Chrome dev 6.0.447.0 is Out

Friday, June 25th, 2010

GoogleChrome.jpg

This morning I noticed that Google Chrome dev 6.0.447.0 was out and they did a few nice little things this time:

  • PDFs are now centered
  • Ctrl-Click a link opens it to the right of the current tab
  • Unified the Page/Wrench menus on the upper-right
  • Fixed a few crashes with tabs and bookmarks

In all, a nice update. I really like the 'unified' page and developer menus as I was constantly getting confused about what was in which when I needed something. This is far better in the long run.

Making a Quad-FAT Universal Binary for CKit

Friday, June 25th, 2010

CKit.jpg

I saw something like this on Apple's Developer website while looking for some help in getting the PPC build of CKit working the other day. It was a way to combine two binaries and make a universal (or FAT) binary as the output. So you'd compile a PPC library and an Intel one, and then merge the two together. Pretty slick. It's a simple command:

  $ lipo -create one.dylib two.dylib -output final.dylib

This creates a single final.dylib file from the two inputs. Very nice.

So I decided to try my hand at creating a single quad-FAT CKit library: 32/64-bit, PPC/Intel. All I needed to do was to change the lines in the Makefile from:

  ifeq ($(shell uname),Darwin)
  all: $(LIB_FILE) $(LIB64_FILE)
  else

to:

  ifeq ($(shell uname),Darwin)
  all: $(LIB_FILE) $(LIB64_FILE)
	lipo -create $(LIB_FILE) $(LIB64_FILE) -output $(LIB_FILE)
  else

Where the original 'all' target just required the two library files to be generated, and now it does that but then it uses the lipo command to stitch the two libraries together into one. This has the effect of leaving the 64-bit PPC/Intel library alone, but glues it onto the 32-bit version. What an amazingly simple thing to do!

I do love these tools!

Identifying, Sorting, Classifying a Ton of Messages

Thursday, June 24th, 2010

The Magic School Bus

Today I started the process of trying to consolidate the 300+ messages in The Magic Schoolbus into a few reasonable categories: OPRA messages (tons of them, space is critical, data format very rigid), Price Messages (little looser, but still important and small), and everything else. The remainder of the messages are really suitable for fitting into self-describing message formats like JSON, or more likely BSON, as they are very flexible - have variable number of components, and don't need to get shot around the network all the time.

The Really Wasteful

Take for instance, the Holiday Calendar. This is just like every other Holiday Calendar I've ever seen: give it a date (or default to today, and it'll give you all the trading holidays for the next 'n' months. Very simple data structure. Even simpler when all you're talking about are US Equities and their options - you don't even need to tell it which exchange you're asking about as they are all the same.

But here's what The Magic Schoolbus does: Every minute it will publish a list of all holidays for the next ten years and those that are registered for this data will receive it. Over, and over again. Every minute. The format is pretty simple as well. There's the basic header of the message (far too verbose and general) but the payload of the message looks like this:

  struct {
    uint16_t       modifiedBy;    // trader ID
    char           today[9];      // YYYYMMDD
    uint8_t        numHolidays;   // # of holidays
    Holidays_NEST  holidays_nest[];
  } HolidayCalendar;

where Holidays_NEST looks like:

  struct {
    char      holidayDate[9];   // YYYYMMDD
    uint8_t   holidayType;      // 1=no trading; 2=half day
  } Holidays_NEST;

Now even if we put aside the problems with this content - like a date that's 9 bytes when 2 would do (as a uint16_t) - in fact, we could compress the entire message to look like this:

  struct {
    uint16_t    modifiedBy;    // trader ID
    uint16_t    today;         // YYYYMMDD
    uint8_t     numHolidays;   // # of holidays
    uint16_t    holidays[];    // tYYYYMMDD
  } HolidayCalendar;

where the 't' is the type of day and the date immediately follows. A simple mask gets us what you need and size comparison (assuming 64-bit pointers) is:

  old size = 12 + n * 10
  new size = 5 + n * 2

and for a typical year we have, say 7 holidays, and ten years, so n = 70:

  old size = 12 + 70 * 10 = 712
  new size = 5 + 70 * 2 = 145
  savings: 79%

It's just stunning how bad some of these messages are.

The Horrible Congestion

Look again at the Holiday Calendar - it's sending this data out every minute. Why? Because the designers believed that this was the only way the data was going to get delivered to the client. What about a data cache/data service? They even have a cache server in the architecture - but it holds all the messages sent and as such, it's not nearly as efficient as a more customized data service.

So I need to do something here - basically, stop the insanity of sending all this data all the time. I need to have the client get it when it requests it and when it fundamentally changes. This means something a lot more intelligent and flexible than read from the database, make a monster message, send it, repeat.

The Task

It's huge. I have to look at all the used messages and then try to see what can be combined into a nice, compact format for sending at high speed to a lot of clients, and what can be more free-form and possibly even skip the 29West sending in the first place.

It's a monster job. But it's gotta be done. The reason this is in such a horrible state is because no one has taken it upon themselves to do this until now. It's ugly, and it's painful, but it's got to be done.

How Best to Describe The Magic Schoolbus? Convoluted… Inconsistent

Tuesday, June 22nd, 2010

The Magic School Bus

I've been struggling to come up with a way to describe the codebase of The Magic Schoolbus - and it's not all that easy for me. The code has good parts - the guys who wrote it are not without understanding. They have atomic operations from boost, and in places, it's clear that they have been trying to get this codebase up to a very respectable level, and in some places, they have done a good job.

But the real problem is that they haven't been consistent. It's a single codebase with multiple projects - many of which are no longer used by anyone, and there's no consistency in the code. OK, that's not 100% accurate - there's plenty of copy-n-paste reuse where they have taken whole applications in the codebase - copied them, and then just replaced a few letters in the name to make a new class. It's the worst kind of consistency: to make any changes across the board, you have to change everything in the codebase.

There's precious little in the way of real re-use. There's even precious little in the consistent use of types. In many places they'll use their own unsigned integer types, and in others they'll use those in stdint.h. I'm all for using either as there are distinct advantages to both, but you really need to pick one, and stick to it. No matter what.

That's the thing that really gets to me: the lack of consistency.

In many projects I've been in, I didn't like the way the original developer started writing the code, but for the sake of the project's consistency, I stuck with it. The goal was to have my changes look no different from the original code. If I succeeded, then there's only one style to understand. This is far, far easier to grok.

But in this codebase it's like Rube Goldberg gone amok. There are some sub-projects where the includes are in the source directory, and others where they aren't. Some use custom types, others use system types. Some use header files and implementation files, and some use massive structs in header files with included implementations. There's just no consistency.

And it's all very, very big. Like the 277,000+ line message header file.

So what to do?

If we try to clean it up - to really use a well-designed object model, then we're gutting everything. And I mean everything. If we're going to do that, then we might as well start over and make something that's far better - with a client library in multiple languages, and forget having 100% backward compatibility. We'll do our best to make a transition plan, but it's a new era, with better information, better performance, etc.

If we do that, are they going to be willing to go with me? Hard to say. I think they have serious doubts about if it can be done. In that, I have serious doubts if they can do it. If they have no faith that it can be done, then there's no chance they'll actually be able to pull it off.

We need to have a more consistent codebase. It's essential to monitoring, stability, low maintenance, etc. But to do that I may have to Just Do It, and then hand it off to them. That's not really ideal, but it may be the only option.

It's a tough place to be. But I'm glad that I have a good handle on the code, and can vocalize the issues for those that have asked me to look into this. It's not an easy decision, but it's one that needs to be made.

SubEthaEdit 3.5.3 is Out

Tuesday, June 22nd, 2010

subethaedit.jpg

I noticed this morning that because of Safari 5.0, the CodingMonkeys have had to update SubEthaEdit 3.5.3 and it's available now. The list of fixes is really very targeted to the Safari fixes, but they seemed to have worked on a few other things as well.

If it weren't for the "jump scrolling" of SubEthaEdit, I'd probably use it as my number two editor behind BBEdit on the Mac. It's good, there's a lot I like about it, but that jumping is just a little too harsh for my mind when scrolling, and so I stick with MacVim.

Still... I'll support the guys with new upgrades because I believe they may make it smoother in the future, and deserve a chance.

Coda 1.6.12 is Out

Friday, June 18th, 2010

Well... Panic beat the weekend rush to deliver an update to the problems with importing Transmit 4 Favorites with Coda 1.6.12. It's not something I've done a lot with, but I have moved to Transmit 4 more, and put in all my favorites there, so it's really nice to see that they will be easily imported when the time comes.

Nice. Nothing earth-shaking, but it's nice.

Google Chrome dev 6.0.437.3 is Out!

Friday, June 18th, 2010

OK, this is a little funny, but I just saw that there's another release of Google Chrome (dev) for the Mac! Amazing. There's nothing new on the release notes, but that's not a total surprise. There seems to be some issues with crashing, but I've honestly not seen a group as big as Google move this fast in a very long time. There must have been something really wrong with the 6.0.437.2 build.

So it goes. I'm glad to know that it's auto-updating just fine now, which is a big relief.