rsync, Part I

Mick Bauer

Issue #107, March 2003

rsync makes efficient use of the network by only transferring the parts of files that are different from one host to the next. Here's how to use it securely.

Andrew Tridgell's rsync is a useful file-transfer tool, one that has no encryption support of its own but is easily “wrapped” (tunneled) by encryption tools such as SSH and Stunnel. What differentiates rsync (which, like scp, is based on rcp) is that it has the ability to perform differential downloads and uploads of files.

For example, if you wish to update your local copy of a 10MB file, and the newer version on the remote server differs in only three places totaling 150KB, rsync will automatically download only the differing 150KB (give or take a few KB) rather than the entire file. This functionality is provided by the rsync algorithm, invented by Andrew Tridgell and Paul Mackerras, which rapidly creates and compares rolling checksums of both files and thus determines which parts of the new file to download and add/replace on the old one.

Because this is a much more efficient use of the network, rsync is especially useful over slow network connections. It does not, however, have any performance advantage over rcp in copying files that are completely new to one side or the other of the transaction. By definition, differential copying requires that there be two files to compare.

In summary, rsync is by far the most intelligent file-transfer utility in common use, one that is both amenable to encrypted sessions and worth taking the trouble to figure out how. Using rsync securely is the focus of the remainder of this article.

rsync supports a long list of options, most of them relevant to specific aspects of maintaining software archives, mirrors and backups. Only those options directly relevant to security will be covered in depth here, but the rsync(8) man page will tell you anything you need to know about these other features.

Getting, Compiling and Installing rsync

Because Andrew Tridgell, rsync's original lead developer, is also one of the prime figures in the Samba Project, rsync's home page is part of the Samba web site, rsync.samba.org. That, of course, is the definitive source of all things rsync. The resources page, rsync.samba.org/resources.html, has links to some excellent off-site rsync documentation.

The latest rsync source code is available at rsync.samba.org/ftp/rsync, with binary packages for Debian, LinuxPPC and Red Hat Linux at rsync.samba.org/ftp/rsync/binaries. rsync is already considered a standard Linux tool and therefore is included in all popular Linux distributions; you probably needn't look further than the Linux installation CD-ROMs to find an rsync package for your system.

However, there are security bugs in the zlib implementation included in rsync prior to rsync v.2.5.4. These bugs are applicable regardless of the version of your system's shared zlib libraries. There is also an annoying bug in v2.5.4 itself, which causes rsync sometimes to copy whole files needlessly. I therefore recommend you run no version earlier than rsync v.2.5.5.

Happily, compiling rsync from source is fast and easy. Simply unzip and untar the archive, change your working directory to the top-level directory of the source code, type ./configure, and if this script finishes without errors, type make && make install.

Running rsync over SSH

Once rsync is installed, you can use it several ways. The first and most basic is to use rcp as the transport, which requires any host to which you connect to have the shell service enabled (i.e., in.rshd) in inetd.conf. Don't do this! The reason the Secure Shell was invented was because of a complete lack of support for strong authentication in the “r” services (rcp, rsh and rlogin), which led to their being used as entry points by many successful intruders over the years.

Therefore, I won't describe how to use rsync with rcp as its transport. However, you may wish to use this method between hosts on a trusted network; if so, ample information is available in both rsync's and in.rshd's respective man pages.

rsync Works Two Ways

A much better way to use rsync than the rcp method is by specifying the Secure Shell as the transport. This requires that the remote host be running sshd and that the rsync command is present (and in the default paths) of both hosts. If you haven't set up sshd yet, do that first.

Suppose you have two hosts, near and far, and you wish to copy the local file thegoods.tgz to far's /home/near.backup directory, which you think may already contain an older version of thegoods.tgz. Assuming your user name, yodeldiva, exists on both systems, the transaction might look like this:

yodeldiva@near:~> rsync -vv -e ssh \
./thegoods.tgz far:~

Let's dissect the command line. rsync has only one binary executable, rsync, which is used both as the client command and, optionally, as a dæmon. In this example, it's present on both near and far, but it runs on a dæmon on neither; sshd is acting as the listening dæmon on far.

The first rsync option in the above example is -vv, which is the nearly universal UNIX shorthand for “very verbose”. It's optional, but instructive. The second option is -e, with which you can specify an alternative to rsync's default remote-copy program rcp. Because rcp is the default, and because rcp and SSH are the only supported options, -e is used to specify SSH in practice.

Perhaps surprisingly, -e scp will not work, because prior to copying any data, rsync needs to pass a remote rsync command via SSH to generate and return rolling checksums on the remote file. In other words, rsync needs the full functionality of the ssh command to do its thing, so specify this rather than scp if you use the -e option.

After the options come rsync's actionable arguments, the local and remote files. The syntax for these is very similar to rcp's and scp's. If you immediately precede either filename with a colon, rsync will interpret the string preceding the colon as a remote host's name. If the user name you wish to use on the remote system is different from your local user name, you can specify it by immediately preceding the hostname with an @ sign and preceding that with your remote user name. In other words, the full rsync syntax for filenames is the following:

[[username@]hostname:]/path/to/filename

There must be at least two filenames. The right-most must be the destination file or path, and the others must be source files. Only one of these two may be remote, but both may be local (colon-less), which lets you perform local differential file copying—this is useful if, for example, you need to back up files from one local disk or partition to another.

The source file specified is ./thegoods.tgz, an ordinary local file path, and the destination is far:~, which translates to “my home directory on the server far”. If your user name on far is different from your local user name, say yodelerwannabe rather than yodeldiva, use the destination yodelerwannabe@far:~.

Setting Up an rsync Server

Using rsync with SSH is the easiest way to use rsync securely with authenticated users, in a way that both requires and protects the use of real users' accounts. But as I mentioned earlier in the “SFTP and SSH” section [of the book, see Sidebar], SSH doesn't lend itself easily to anonymous access. What if you want to set up a public file server that supports rsync-optimized file transfers?

This is quite easy to do. Create a simple /etc/rsyncd.conf file, and run rsync with the --daemon flag (i.e., rsync --daemon). The devil, however, is in the details. You should configure /etc/rsyncd.conf very carefully if your server will be connected to the Internet or any other untrusted network. Let's discuss how.

rsyncd.conf has a simple syntax; global options are listed at the beginning without indentation. Modules, which are groups of options specific to a particular filesystem path, are indicated by a square-bracketed module name followed by indented options.

Option lines each consist of the name of the option, an equal sign and one or more values. If the option is boolean, allowable values are yes or no (don't be misled by the rsyncd.conf(5) man page, which in some cases refers to true and false). If the option accepts multiple values, these should be comma-space delimited, for example, option1, option2.

Listing 1 is part of a sample rsyncd.conf file that illustrates some options particularly useful for tightening security. Although I created it for this purpose, it's a real configuration file and syntactically complete. Let's dissect it.

A Sample rsyncd.conf File

As advertised, the global options are listed at the top. The first option set also happens to be the only global-only option: syslog facility, motd file, log file, pid file and socket options may be used only as global settings, not in module settings. Of these, only syslog facility has direct security ramifications. Like the ProFTPD directive SyslogFacility, rsync's syslog facility can be used to specify which syslog facility rsync should log to if you don't want it to use daemon, its default.

For detailed descriptions of the other global-only options, see the rsyncd.conf(5) man page; I won't cover them here, as they don't directly affect system security, and their default settings are fine for most situations, anyhow.

All other allowable rsyncd.conf options can be used as global options, in modules or both. If an option appears in both the global section and in a module, the module setting overrides the global setting for transactions involving that module. In general, global options replace default values, and module-specific options override both default and global options.

The second group of options falls into the category of module-specific options.

use chroot = yes: if use chroot is set to yes, rsync will chroot itself to the module's path prior to any file transfer, preventing or at least hindering certain types of abuses and attacks. This has the trade-off of requiring that rsync --daemon be started by root, but by also setting the uid and gid options, you can minimize the amount of the time rsync uses its root privileges. The default setting is yes.

uid = nobody: the uid option lets you specify with which user's privileges rsync should operate during file transfers, and it therefore affects which permissions will be applicable when rsync attempts to read or write a file on a client's behalf. You may specify either a user name or a numeric user ID. The default is -2, which is nobody on many systems, but not on mine, which is why uid is defined explicitly.

gid = nobody: the gid option lets you specify with which group's privileges rsync should operate during file transfers, and it therefore affects (along with uid) which permissions apply when rsync attempts to read or write a file on a client's behalf. You may specify either a user name or a numeric user ID; the default is -2 (nobody on many systems).

max connections = 20: this limits the number of concurrent connections to a given module (not the total for all modules, even if set globally). If specified globally, this value will be applied to each module that doesn't contain its own max connections setting. The default value is zero, which places no limit on concurrent connections. I do not recommend leaving it at zero, as this makes Denial-of-Service (DoS) attacks easier.

timeout = 600: the timeout also defaults to zero, which in this case also means “no limit”. Since timeout controls how long (in seconds) rsync will wait for idle transactions to become active again, this also represents a DoS exposure and should likewise be set globally (and per module, when a given module needs a different value for some reason).

read only = yes: the last option defined globally is read-only, which specifies that the module in question is read-only, i.e., that no files or directories may be uploaded to the specified directory, only downloaded. The default value is yes.

The third group of options defines the module [public]. These, as you can see, are indented. When rsync parses rsyncd.conf downward, it considers each option below a module name to belong to that module until it reaches either another square-bracketed module name or the end of the file. Let's examine each of the module [public]'s options, one at a time.

[public]: this is the name of the module. No arguments or other modifiers belong here: just the name you wish to call this module, in this case public.

path = /home/public_rsync: the path option is mandatory for each module, as it defines which directory the module will allow files to be read from or written to. If you set the global option use_chroot to yes, this is the directory rsync will chroot to prior to any file transfer.

comment = Nobody home but us tarballs: this string will be displayed whenever a client requests a list of available modules. By default there is no comment.

hosts allow = near.echo-echo-echo.org, 10.18.3.12 and hosts deny = *.echo-echo-echo.org, 10.16.3.0/24: you may, if you wish, use the hosts allow and hosts deny options to define Access Control Lists (ACLs). Each accepts a comma-delimited list of FQDNs or IP addresses from which you wish to explicitly allow or deny connections. By default, neither option is set, which is equivalent to “allow all”. If you specify an FQDN, which may contain the wildcard *, rsync will attempt to reverse-resolve all connecting clients' IP addresses to names prior to matching them against the ACL.

rsync's precise interpretation of each of these options depends on whether the other is present. If only hosts allow is specified, then any client whose IP or name matches will be allowed to connect, and all others will be denied. If only hosts deny is specified, then any client whose IP or name matches will be denied, and all others will be allowed to connect.

If, however, both hosts allow and hosts deny are present, hosts allow will be parsed first, and if the client's IP or name matches, the transaction will be passed.

If the IP or name in question didn't match hosts allow, then hosts deny will be parsed, and if the client matches there, the transaction will be dropped.

If the client's IP or name matches neither, it will be allowed.

In this example, both options are set. They would be interpreted as follows:

Requests from 10.18.3.12 will be allowed, but requests from any other IP in the range 10.16.3.1 through 10.16.3.254 will be denied.
Requests from the host near.echo-echo-echo.org will be allowed, but everything else from the echo-echo-echo.org domain will be rejected. Everything else will be allowed.

ignore nonreadable = yes: any remote file for which the client's rsync process does not have read permissions (see the uid and gid options) will not be compared against the client's local copy thereof. This probably enhances performance more significantly than security; as a means of access control, the underlying file permissions are more important.

refuse options = checksum: the refuse options option tells the server-side rsync process to ignore the specified options if specified by the client. Of rsync's command-line options, only checksum has an obvious security ramification. It tells rsync to calculate CPU-intensive MD5 checksums in addition to its normal rolling checksums, so blocking this option reduces certain DoS opportunities. Although the compress option has a similar exposure, you can use the dont compress option to refuse it rather than the refuse options option.

dont compress = *: you can specify certain files and directories that should not be compressed via the dont compress option. If you wish to reduce the chances of compression being used in a DoS attempt, you also can specify that nothing be compressed by using an asterisk (*), as in our example.

This simple example should get you started offering files for download by rsync. Next month, we'll cover setting up rsync modules (directories) at the filesystem level to accept anonymous uploads and authenticate users.

Building Secure Servers with Linux