Sunday, August 31, 2008

Wrong disk being used when booting from a USB stick with an encrypted root partition

I had decided to play around with installing a full blown Fedora 9 distribution to a 16GB USB flash drive. The installation went well but after the installation was done and the system restarted I found that it would not boot due to a strange issue with the root file system not being found.

It turned out that this was because I enable encryption on theroot file system. The boot loader was not able to mount the encrypted file system for some reason. After further investigation (by removing quiet from the boot command) I found that cryptsetup was attempting to open the encrypted partition on the wrong drive. In my case cryptsetup was attempting to open the encrypted file system from /dev/sdc2 which actually made sense seeing that this partition was /dev/sdc2 at install time but as this is now the boot device it is sda and my encrypted partition is at /dev/sda2.

I began to look through the configuration files in /etc to find the hard coded references to /dev/sdc2 but didn't seem to find anything that would result in this behavior. I then stopped and thought about it for a moment and quickly realized that it couldn't be anything on the root file system causing the conflict as the root file system was not being mounted at boot time. So, I then started picking a part the boot partition to see if I could find this hard coded /dev/sdc2 reference. I did find some references but they did not seem to be the cause of the problem. So, I was back to square one. There had to be something causing this problem but what was it? Out of desperation I decided that the problem may be inside one of the boot images. This thought required me to figure out how to get inside the image and see its content. This didn't take me long
because there seems to be some good references out there about unpacking or mounting a boot image. I started with initrd-<version>.img and quickly saw the actual problem. Inside the initrd image there is a script named init that gets run. It is this script that is the cause of my headaches and lead me tothe process of fixing it.

Here is what I did:

Using the Resuce CD (or another install) mount the boot and encrypted partition on the USB stick. In my case this was /dev/sdc1 (boot) and /dev/sdc2 (LUKS LVM). I used the resuce CD so my encrypted partition was mounted for me (it appeared that I had to enable networking for it to load the LUKS stuff) but the boot partition was not automatically mounted. So for me to mount it I did:
mount /dev/sdc1 /mnt/sysimage/boot 

This mounted the boot partition of my USB stick at /boot of the root file system of my encrypted partion on the USB stick. In other words, the encrypted partition (sdc2) had been mounted by the Rescue CD at /mnt/sysimage.

Changes to the boot partition:

Once mounted I made the simplest change first. I had to modify /mnt/sysimage/boot/grub/device.map and make sure hd0 was set to /dev/sda. Prior to my edit it was set to /dev/sdc which wouldn't be right if the USB stick was the actual boot media.

Changes to the root partition:

Next I fixed the /mnt/sysimage/etc/crypttab file to look for the LUKS partition at /dev/sda2 instead of /dev/sdc2.

Next I fixed the /mnt/sysimage/etc/sysconfig/grub file to look for the boot property at /dev/sda instead of /dev/sdc.

Changes to the initrd image:

Now came the hard part. The reason for the boot failure is because the initrd image (that controls boot up) has been left in charge of mounting the encrypted file system. A startup script that runs prior to any configuration from the actual root file system has a hard-coded setting for the location of the LUKS file system. To fix this you have to modify the initrd image itself.

First you need to extract the initrd image so that you can modify its contents:

mkdir /tmp/initrd-usb
cp /mnt/sysimage/boot/initrd-*.img /tmp/initrd-usb
cd /tmp/initrd-usb
gunzip < initrd-*.img | cpio -id

You should now have the entire contents of the initrd image extracted in /tmp/initrd-usb. We now need to modify the /tmp/initrd-usb/init file and fix the hard coded /dev/sdc2 reference for the encrypted file system. In my case I found three references to sdc3. One of them was the echo statement that printed "Setting up disk encryption: /dev/sdc2" at boot time.

echo Setting up disk encryption: /dev/sdc2 

I changed this one to /dev/sda2 because I wanted the output at boot time to be accurate. The other two references were to the cryptsetup command right below the echo statement:

cryptsetup luksOpen /dev/sdc2 luks-sdc2 

I changed the /dev/sdc2 to /dev/sda2 and luks-sdc2 to luks-sda2. The luks-sda2 will be used as the name of the encrypted device in the device mapper and the /dev/sda2 is the actual partition that contains the encrypted file system.

Once I made the three changes to the init file I had to repackage the initrd image. Before repackaging though I moved the original initrd image file out of /tmp/initrd-usb so that it didn't get included in the new initrd image.

mv /tmp/initrd-usb/initrd-*.img /tmp 

Then I repackaged the initrd image:

cd /tmp/initrd-usb
find ./ | cpio -H newc -o | gzip -9 > /mnt/sysimage/boot/initrd-<kernel version>.img

In my case <kernel version> was 2.6.25-14.fc9.i686 so my command was:

find ./ | cpio -H newc -o | gzip -9  > /mnt/sysimage/boot/initrd-2.6.25-14.fc9.i686.img 

Done!

Once that was done I exited the shell and waited for my machine to reboot. I then booted from the USB stick and sure enough all went as planned and I was up and running.

References:

http://musialek.org/?p=3

This is where I got the specific commands for repackaging the initrd image. Seemed this information wasn't as easy to find as extracting the initrd image contents.