Latest Kernel and Disasters

This morning I had finished downloading all the kernel RPMs for 2.2.19-6.2.1 and could take the time to upgrade the kernels of both mao and tux. It was really quite easy and painless. The only problem is that because I keep tux up so much, I had to try and remember what I had running after the reboot. Not a big problem, but a pain.


I spoke way, way too soon...

DISASTER! When I did the upgrade to tux I didn't count on the fact that the upgrade wouldn't be able to find his Compaq Smart Array 3200 controller. The upgrade on mao worked so well I didn't even give it a second thought. But when I tried to reboot tux I got no SCSI devices found and the machine did a kernel panic.

I was agast with chest pains.

If the system can't boot, then there's no way to see the drives and have any chance of fixing the problem. I tried building a boot floppy on mao only to later realize that it wouldn't work because a boot floppy assumes the system set-up at hand. This means that I couldn't possibly build one for tux as he wasn't running. No, I didn't make one before the upgrade - which I had in the past, but will now for certain.

The problems continued when I finally got the RedHat 6.2 rescue mode off the install CD-ROM. The rpm program on the CD was based on an older version (v3) and the RedHat advisory had me upgrade the database so it's incompatible with the one on the CD.

So... In order to fix the problem, I had to be on the drive, mounted as root with everything working OK. Sounds impossible? I thought so too. But here's what I did to fix it.

  1. Boot off the RedHat 6.2 CD and enter Rescue Mode by typing linux rescue at the opening prompt. Thankfully, this version of the kernel sees the SCSI array and loads the drivers but doesn't create any devices in /dev.
  2. With the rescue system up, look at the file /proc/partitions and see that they have the partitions as they should be. This requires knowing how your machine was set up, but that's reasonable in this situation. What you need to get are the Major and Minor numbers for all the partitions on the disk you have. We'll use them to manually create devices in the next step.
  3. At the prompt, type the following commands - one for each partition:
    # mknod /dev/sda b 72 0
    # mknod /dev/sda1 b 72 1
    # mknod /dev/sda3 b 72 3
    # mknod /dev/sda5 b 72 5
    # mknod /dev/sda6 b 72 6
    # mknod /dev/sda7 b 72 7
    

    where in this example the Major number was 72 and the Minor numbers were 0, 1, 3, 5, 6, 7. It's important to understand the size and how Linux is laid out. For my installation, and may others, there is a /boot partition, a swap partition and then the rest of the drive is
    /. For me these are the last three, in order.

  4. Now we need to make a directory to mount this filesystem. I use /d via:
    # mkdir /d
    

    and then you mount the root filesystem there with:

    # mount -t ext2 /dev/sda7 /d
    

    At this point we're getting close to having something useful. We still need to mount the /boot partition properly with:

    # mount -t ext2 /dev/sda5 /d/boot
    
  5. Now cd to /d and see if your files are there. They should be. If not, then you have even bigger problems than I did - good luck. But if you see them, then you need to get the old kernel RPMs back onto your system and 'downgrade' to get back to working. The trick here is that ftp needs to have the network working and /etc/services defined and it isn't in the rescue partition. To configure the network interface I used:
    # ifconfig eth0 24.29.224.2 netmask 255.255.255.0 broadcast 24.29.224.255
    # route add -net default gw 24.29.224.1
    

    where I knew the address that I needed to use and the gateway as well.
    Next use pico, or another editor to create a minimal
    /etc/services:

    ftp	21/tcp
    

    Now you'll be able to fun ftp. A word of warning: use the '-n' option on ftp and the USER command because in this minimal configuration ftp can't understand the automatic login responses from the server.

  6. OK... we're almost done. Now we need to make this mounted filesystems look like root so that we can run the existing rpm and fix the system. That's done with:
    # chroot /d
    

    where once again, the mount point for the drive's root is /d.

  7. Now we can run any command as if the system were up and happy. This is a major breakthrough as it means that you only have to have the RedHet CD to fix a problem as severe as this one. I did a:
    # rpm -ivh kernel*.rpm --force
    

    and while it complained as usual, it worked. I then did the mkinitrd for the two kernels and it worked perfectly.

tux is back, but it took me more than 24 hrs. to figure this out. This should be a FAQ on the RedHat site, but it isn't. At least I have it here now.