A Linux-Based Automatic Backup System

Michael O'Brien

Issue #80, December 2000

A step-by-step procedure for establishing a backup system that will save time and money.

Frequently people take computers for granted. This behavior becomes very dangerous when people rely on a computer to store and manipulate important data but fail to back up those data. If you are reading this, then you are probably aware of the need for reliable backups. However, you may work with people who are not, and your job may be seriously affected by a loss of their data.

I work in a scientific research group. Our laboratories are modern, and almost all of our data acquisition is performed by computers running Windows 95. In essence, our whole business is to acquire information that is stored on computers. Data loss can end up costing thousands of dollars, especially when one considers the salaries of all the people who helped produce that data.

To protect our group from data loss, I proposed an automatic, network-based backup system for our irreplaceable data. The costs were negligible (we had a 486/66 computer that was not in use and a 3GB hard disk that cost us little more than one hundred dollars). I went through several versions of this system over the past two years, starting with a Windows 95-based system and ending up with a fast, powerful Linux-based system. The current version is easy to implement, inexpensive, powerful and reliable. Assuming you have a networked Linux machine ready, you should be able to use this article to set up your own automatic backup system in a short time.

Necessary Tools

All the tools that are needed for the automatic backup system are included with most Linux distributions. The first is Samba, an excellent open-source package that allows UNIX-type systems to communicate with Windows-based systems over a TCP/IP network. The Linux version includes a utility called smbmount. It uses the smb file system kernel support unique to Linux, allowing any directories on Windows computers to be mounted to the Linux file system and manipulated as if they were on the Linux machine's hard disk. This will allow the archiving programs (in their update mode) to check to see if a file on the Windows machine needs to be backed up before it is transferred through the network, thereby reducing the network bandwidth requirements, CPU load and hard disk wear dramatically.

There are numerous archiving programs available for Linux, including tar, bzip2, and even the simple cp command. However, I chose to use tools from the open-source Info-ZIP project. These tools are included with most Linux distributions are available for various other platforms, are fast and small, and use an established file standard for Windows systems. Furthermore, the compression abilities of the Info-ZIP tools allow one to significantly reduce the size of the file archives on the Linux backup system.

Preliminary Steps

Network shares (a hard drive or any directory with all its subdirectories) must be set up on the Windows computers to be backed up. If file sharing is not already enabled, you can set it up from the Windows network control panel. Then, in the Windows Explorer, right click on the drive or folder you want to access from the network and choose the Sharing option from the pop-up menu. I recommend allowing read-only access so that crackers cannot alter or destroy your data if they somehow obtain your passwords. Make sure to record the names of these shares. It is a good idea to place the netbios names, DNS names and IP numbers of the Windows computers in your /etc/hosts file of the Linux machine (as directed by the comments in /etc/hosts), especially if your computers lie across different subnets.

Once this is done, you must prepare your Linux system to access and store the data. First create a mount point for the Windows shares by typing mkdir /mnt/smb. After that, you must decide where you will put the archived backups.

I put the backup files on a separate 1GB vfat (Windows) partition that remains unmounted at all times except when the actual backup processes are running. This way, the files are protected as much as possible from file system damage due to power outages, and the hard drive can be temporarily removed from the Linux computer and put into a Windows computer to facilitate recovery. In order to accommodate this, I created a mount point called /mnt/backups.

Scripts

A script is a text file containing commands that one would normally type at the Linux command prompt. You can use them to easily accomplish very complex tasks repeatedly. Making a script is as simple as typing the text into your favorite editor, saving it and then using the chmod u+x command on the file.

Listing 1 shows the script that backs up the DATA directory from the d_drive share on the computer named “higgins”. This script runs on my Linux computer, “magnum”, and is stored as the file root/backup/higgins.

Listing 1. DATA Directory Backup

The first line, while looking like a comment, actually instructs the computer to use bash to execute the script. Next comes all the shell variables that the main part of the script will use to back up the data on higgins. This practice of putting the case-specific values in variables at the beginning of the script allows the user to make new versions for new computers very quickly by copying the basic script and changing a few easily seen values. Listing 2 shows a different set of variables for a Windows 98 machine (“rick” with a shared C: drive) and a Windows NT machine (“tc” with a shared folder named “data”). Note how the Windows NT variables need to specify a user name and the password associated with that username.

Listing 2. Variables for the Windows Machines

The remaining lines actually do the work. The command export PASSWD puts the password in an environment variable that the smbmount program reads automatically. The smbumount command is executed next in case someone forgot to unmount an SMB share from the mount point. (If there is nothing there, smbumount returns a harmless error message and the script continues.) The smbmount program then attempts to mount the remote share. -N switch instructs it not to ask for a password to replace the value of the PASSWD environment variable. The -n switch communicates the username to smbmount.

An if statement checks to see if the specified backup files actually exist before doing any backup work in case the network may be down or the remote computer is switched off. In this case the script will terminate after making the mount point available again.

If the Linux machine can access the remote files, all archiving is done with the zip command. The -r switch is the standard recursion option, which makes zip go through every subfolder of the data directory. The -u puts zip in update mode, where it will only add or change files that are not already archived or those that have changed. The -v parameter instructs zip to verbosely show the names of every file it checks on the display—a useful option for troubleshooting.

After a backup script has been set up for each computer, you can make a simple script named master to call each of the backup scripts sequentially. An example of my master script is shown in Listing 3.

Listing 3. Master Script

Activating the System

After all the scripts have been written, you can put a symbolic link to the master script in one of the /etc/cron.d subdirectories so that the computer will take care of the backups automatically. For my setup, I typed ln -s /root/backup/master /etc/cron.d/weekly/master to set automatic weekly backups. You can back up on a daily basis if you need to since the update option of archiving utilities minimizes resource requirements.

The first usage of a backup script, however, will require a lot of network bandwidth and CPU time. Hence, you may want to consider running it for the first time by hand or with the at command at night.

Caveats

Five important points should be noted:

  1. Any shell script with passwords should be made unreadable by anyone but the owner by using the chmod go-r command.

  2. If your data is very sensitive, you need to set up adequate security measures to keep industrial spies from hacking into your Linux machine and stealing your centralized data. See the Linux security HOWTO for more information.

  3. The smbmount program tends to vary slightly across different distributions of Linux. Hence, if the scripts in this article don't work quite right for you, check out the man pages to see how your version of smbmount handles its command-line options.

  4. Users of the Windows computers must be taught to keep their data under a central directory, such as “users” or “data”, instead of several random directories spread across the hard drives. Some people are too lazy to move their files into a central directory, despite the fact that it takes only five seconds. You may have to actually move their files yourself before they will even start using the centralized directory. Remember, though, that these users may be the greatest threat to your organization in terms of data loss since they never bother to make backup copies of their own data.

  5. Finally, a hard drive is a very practical place to put the backups of irreplaceable data. My archive files use less than 400MB of hard disk and contain more than a 1.5GB worth of data. However, you may want to consider obtaining a large-capacity, removable drive for your Linux machine. With this, you can occasionally copy the archive files from your hard disk to a removable disk and take them home in case of physical destruction or theft of the machine.

Conclusion

A Linux-based network backup system for irreplaceable data files on many networked computers is inexpensive, reliable, easy to set up, trivial to expand and extremely practical. With just an hour of time you can potentially save your group or company many thousands of dollars in the case of a hard drive crash. Currently, my Pentium 150 workstation keeps archives of years of mission-critical data from eight computers spread across three buildings and two subnets. It takes me less than two minutes to add a new computer to the system due to the use of shell variables in the scripts.

This is the kind of task Linux was born to do. You can take an old surplus computer, make it “headless” with no keyboard or monitor and stick it somewhere in a closet where it will humbly do its work unseen. You can also run it on your personal workstation since the Linux tools can run in the background. You can set up an FTP server on the Linux machine on the fly if you need to restore files to a crashed computer or simply take the hard drive out and stick it inside a Windows machine. Since Linux has been designed to coexist with many different computers and operating systems, one can adapt the scripts to back up many different kinds of computers, including other Linux machines via NFS and even MacIntosh computers with the netatalk and hfs packages.

Resources

Michael O'Brien is a graduate student at the University of New Mexico where he studies optics. Computers are both a hobby and tool that end up helping him get his work done. He manages a small computer room in his spare time and likes to help Linux users on the Usenet newsgroups. He may be reached at mobrien@unm.edu.