The Value of Tuned Code

fortran.jpg

One of the things that I keep coming back to every so often is the simulation work I did during my Ph.D. Now that I had upgraded to 10.5.1 on my laptop, I wanted to see about getting the code going in x86_64 mode. I have had it running in 64-bit on PPC, but now that Xcode supports it, and the gfortran I use does as well, I was interested in seeing if it was going to run faster - or slower, in x86_64 mode.

The first thing I noticed was that the LINPACK routines that I had taken and hand-tuned to the problem were not working out well with the 64-bit compiler. I was getting SegFaults, and rather than mess with trying to fix those versions, I thought I'd use the BLAS and LAPACK that are bundled with Mac OS X in the Accelerate Framework. These are supposed to be optimized for the AltiVec (PPC) and SSE3 (Intel) so I was thinking that moving this way was a nice upgrade.

The code changes weren't major - primarily in the data storage going into the functions, so it only took me a few hours to fix all that up and clean up the code with a few #ifdefs to make it compile either with the LINPACK routines I built or with the LAPACK functions that came with the OS. What was major, were the results.

As I had hoped, the 64-bit version of LAPACK was faster than the 32-bit version. However, the surprise was the fact that my 32-bit hand-optimized routines were faster still. If I wanted, I'm guessing that I could update these guys to 64-bit by looking at the use of the data element sizes - that's got to be the cause of this as the logic is fine, and then I might very well have something that's faster still. What a shot in the arm! I had no idea that the modifications I had made were going to be that fast. Good for me.

UPDATE: I went into the code and found that it was a simple matter of how the integers were being passed from FORTRAN to C. By putting a simple typedef in the code:

    /*
     * Because we need to be able to build this for 32-bit and 64-bit
     * versions, I want to be able to typedef the integer here so that
     * the value coming in from the FORTRAN code matches what we will
     * use here. Without this, we'd have a mess on the conversions.
     */
    #if defined(__x86_64__) || defined(__ppc64__)
    typedef int f_int;
    #else
    typedef long int f_int;
    #endif

and replacing the long int with f_int (FORTRAN int), I was able to use the same code for both builds and the errors went away. Nice.