Disaster Recovery

Mark F. Komarinski

Issue #4, August 1994

It's happened to me more than once. It will probably happen to you at one time or another. You turn on your PC, expecting yet another fun session of pure Unix power, when something goes wrong. It won't boot; hard drive not found; it just hangs. Now what? This article will help you figure out what is wrong and get started with fixing it. Read this article before something goes wrong, and it will be easier to fix it when it happens.

Something has gone wrong. That's all you know. Staring at your blank or garbage-ridden screen, the only thing you can think is “Now what do I do?” Even if you have not had this happen yet, there is probably a good chance you will face this. With all of Linux's power, it is still rather easy for a new—or even experienced—user to make a mistake and mess up something.

With some advance preparation, this kind of situation won't leave you stranded. Make sure you know how to track down a problem, have a bootable disk, and have a set of rescue disks, configured for your particular setup.

Your first step is tracking down the problem. Do you get to the `Uncompressing Linux...' message? If not, your problem is with the boot disk or LILO. Having a spare boot disk should allow you to boot your system, and then you can reconfigure LILO or make a new boot disk.

While Linux is booting, do you get past the partition check? If so, your hard drives are probably fine with Linux. I had a hard drive once that made Linux hang when it tried to find the partitions. The drive didn't work in any other system I tested, so the drive was bad.

Also, if you get past the partition check, then the kernel is not your problem. After the partition checks are done, root is mounted and then /etc/inittab is read. As you may or may not recall, /etc/inittab is used by the init program to start login processes and begins reading your /etc/rc files to mount your partitions, start your network among other things. Once the inittab is read, it goes to the corresponding file for mounting additional filesystems, starting network services, and other startup services. If you see your filesystems being mounted, that means that some of your rc files are being started.

Once the inittab is read, it goes to the corresponding startup file (“rc file”) for mounting additional filesystems, starting network services, and other startup services. If you see your filesystems being mounted, that means that some of your rc files are being started.

Finally, make sure that your network services are starting if you want them started on your system. This is one of the final parts to the startup sequence.

Now, what do you do if you know you have a problem? Before you get into a jam, make sure you have backups. If things get too bad you can always re-initialize your partition and restore from an old backup. Also make sure to have backups handy of your /etc directory.

One good idea is to get a copy of the rescue disks available through FTP. These disks will allow you to boot linux from a pair of floppies and access most of your partitions. This way, even if you can't boot because of a bad /etc/inittab file, you can still boot linux and get access to the bad file, then fix it.

Some of these rescue disks come completely ready-made, so that you can use the rescue disks very easily. The disadvantage to these sets is that they may use an older kernel, may not have some pieces that you need (SCSI support, for example), and may not have the set of programs that you want to see in a rescue disk.

There are other sets of rescue disks where you specify which programs you want to include. They also use the current version of the kernel that you are using. The drawbacks to these are that you need to know what you are doing and they take a bit more work than simply getting a pre-built rescue disk. Two such packages are SAR (Search and Rescue) and rescue. Each of these packages is small, as they both use programs that are already on your system.

If you have two floppy drives, you can go through the rescue disk(s) and find out what programs that you'd like to add, such as your favorite editor. Usually one disk can contain all the programs you'd need in the event of a disaster, but having two disks chock full of utilities will be even better. Here's how:

First, put a floppy in your second drive. I have a 5.25 HD drive as my second floppy, so I'll use that in my examples.

The fdformat program is used to low-level format a floppy. Its syntax is:

fdformat <device>

where <device> is the name and type of drive you're using. For example, I have a high density 5.25" drive as drive 2, so my <device> would be /dev/fd1h1200. A high density 3.25" would be /dev/fd1H1440.

Now you put a filesystem on it. Use the same filesystem that you are using on the root partition of your system. In my case, that would be the Second Extended Filesystem (ext2). So, let's put a filesystem on my floppy:

mke2fs -c /dev/fd1h1200

Replace the /dev/hd1h1200 with /dev/fd1H1440 if your second drive is a 3.5" high density drive.

Now you should have a filesystem on a disk. Mount it on an unused directory. The /mnt directory is usually used for this. If /mnt does not exist on your system, do

mkdir /mnt then do mount -t ext2 /dev/fd1 /mnt

Your disk will now be mounted on /mnt. At this point, start copying over whatever programs you want. Make sure of two things:

  1. Make sure that the shared libraries on the rescue disk will work with the programs that you put on the disk.

  2. Make sure that you copy over all the files you need. Some editors have configuration files or help files you may need.

If you are using a rescue disk such as SAR or rescue, you won't need to worry about libraries and you can skip ahead a few paragraphs. Or you can read it and get a better hint about how the shared libraries work.

The idea behind shared libraries is that many common C functions get included in one file in a common location. This saves a lot of space as those common functions no longer need to be duplicated in each program binary. The drawback is that it is a tiny bit slower because now two files have to be loaded instead of one. For the toss-up between speed and size, I'll take the size, especially on a floppy with very limited space.

Another small problem with shared libraries is that programs compiled to use a new library won't work if the only library that is available is an older one. For example, a program compiled to use version 4.4 of the libraries won't work if the only set of libraries available is version 4.3. You'll wind up getting an error message about incompatible libraries. If this happens, get a new copy of the libraries or recompile the program to use an older library.

[Ed. Note: this is not strictly true. With modern libraries, the user will get a message, but the program will still try to run if all the necessary symbols are there. For instance, I'm running some binaries compiled under libc 4.5.8 which run fine with my libc 4.4.4, other than giving an error message. I don't know if you want to deal with this or not; probably not.]

To check what versions of libraries the programs are looking for, use the ldd command:

ldd <program>

This will return the version of libraries that the program was compiled under. ldd /bin/write for me returns:

libc.so.4 (DLL Jump 4.4pl1)

If the files in the /lib directory are libc.so.4.4.1 or above, it will be fine to put the `write' command on your disk. If the library needed is newer than the library on the rescue disk, then you would need to find an older version of the program and put that on the floppy. For example, if the library on the rescue disk was libc.so.4.3.1, I'd need to find an older version of write to put on the disk, or else put libc.so.4.4.1 on the disk.

You don't need to put just executables on this disk. A copy of gzip and a bunch of HOWTO files can come in quite handy as well. Here's a list of suggested files, all available through FTP or on many BBSs. Some of these files may be on the rescue disk you have. Make sure.

Take any of these editors. I find that ed is small and compact, but not much fun with heavy editing or large files. For you, joe may be worth the extra 98k it takes up. If you are unfamiliar with joe or ed, you can use vi, which is a standard program on just about all UNIX systems:joe editor 133kvi editor 101ked editor 35k

General Everyday Utilities:diff 61k (finds changes in big files)grep 61kgzip 46klilo 40kMAKEDEV 9kmknod 3k

Backup utilities:This will vary depending on how you did your backup. You may want a copy of tar, afio and ftape. Get some utilities for the filesystems you run:e2fsck 35kmke2fs 20k

Get some HOWTO files (compress with gzip for real space savings!):Installation-HOWTO 48kSCSI-HOWTO 41kFtape-HOWTO 18k

One more thing you'll want on-hand is a list of all of the cards that are in your machine, the IRQs that they use, and whether they are used by Linux or not. Sometimes a problem can be an incorrectly configured kernel or card.

If you keep these disks set aside and updated often, you'll be ready for anything that might happen.

Tip of the month: When you hit the backspace, do you see /'s followed by the character you just backspaced over? Don't you hate it, too? It reminds me of reading The Unix Programming Environment. Get a new copy of agetty and this should cure the problem. A copy distributed with some Slackware releases had this problem.