The Open-Source Classroom

Back It Up, Buster!

Shawn Powers

Issue #263, March 2016

Remote backups aren't even remotely optional. Do it!

It still shocks me how many people don't keep backups of their files. Every time people ask me to look at their computers (almost always a malware-infected Windows machine), I ask them if they have backups before I start poking around. Invariably, the answer is no. Sometimes the computers even contain vital business data, and the closest thing to a backup is a local copy of the database in the same folder as the active copy. As someone who had a house burn down, I can tell you, off-site backups aren't just for fancy government organizations!

Things That Aren't Backups

RAID (Redundant Array of Independent Disks), as wonderful as it is, must not be mistaken as a backup. Simply put, RAID is a method for ensuring your single, local copy of data is less prone to failure. And that's really only for RAID levels that create mirrors or parity. Using RAID level 0, you're actually more likely to lose data than by storing it on a single hard drive!

Figure 1. RAID 5 is awesome, but trust me, it's not foolproof.

I personally learned the hard way that RAID is not the same as a backup. I've been preaching it for years, but on my local NAS (Network Attached Storage) device, I implemented RAID 6. Since that meant two drives could fail and I still wouldn't lose data, I didn't bother backing up my collection of videos, including irreplaceable home videos. As luck(?) would have it, a power fluctuation during a storm caused three hard drives in my array to fail at the same time, and I lost all 24TB of storage. Most of the data I lost was just digital backups of our DVDs and Blu-rays, but about 10GB of that data was home videos. And since I didn't back it up, it's lost forever.

The moral of the story is this: RAID is awesome. You should use RAID if possible to help prevent data loss resulting from failed hardware. But, don't stop with RAID and assume your data is safe. It's not. Your data is safe only if there is more than one copy of if. And, that's where actual backup comes into play.

Live Mirrors

Live copies of your data is a backup of sorts. The great part about live backups is in the name; they're live. The minute you create or change a file, those changes are synced to your mirror. The problem with live mirrors is that if you accidentally save over a file, or inadvertently remove a file, those changes are synced and you lose all the copies of the data in real time. Some options do automatic versioning of files, which makes your data a little more secure, but for the most part, live mirrors protect you from hardware failure, but not human error.

My favorite commercial software for live mirroring is Dropbox. It's free for a small amount of storage (probably enough for a folder full of office documents), and it does file revisioning, which helps for accidental overwrites. Unfortunately, in order to back up a significant amount of data, you have to pay annually for Dropbox storage. There are other Dropbox-like solutions, such as SugarSync, Box, Google Drive and so on. Most have a small amount of storage for free and allow extra storage for a subscription fee.

If you have multiple computers, or a computer an a server, another live mirror option is to use BitTorrent Sync. You can get the free client at getsync.com, and install the software on most platforms, including headless on a server. I've written about BitTorrent Sync before. The beauty is that all the data is stored on your own machines, so the only limit on space is how much storage you have on your devices.

I personally use BitTorrent Sync to keep a full, live mirror of my wife's home directory on our home server. That way, if she forgets her laptop at work, I can access any file for her, even if her laptop is shut off. She can edit the files at home, and as soon as she turns her laptop back on, BitTorrent Sync updates the files at work.

You can have as many mirrors as you want with Bittorrent Sync, so we also have a “Family” folder that we sync between all our computers. If we need to pass a file to a family member, we just drop it in the Family folder and everyone instantly has access to it. The free version of BitTorrent Sync does not do file versioning, so if you make a change, the old version of your file is lost forever. That can be very convenient, but it also can be very frustrating. That's why I consider BitTorrent Sync vital, but I don't consider it a backup.

Figure 2. BitTorrent Sync's free version offers unlimited data syncing.

Backup Programs

There are two backup programs of which I'm particularly fond. I've written about them both in the past, but I want to highlight them again here. The first is BackupPC. It is a server-based program that creates true backups of data and keeps multiple copies of files on a rotation. For example, BackupPC will store a complete snapshot backup of every day of the week and then delete the oldest backup daily when the full week is stored. It also can be configured to keep a monthly or even annual backup snapshot as well. The best part about BackupPC is that it uses hard linking to store multiple copies of the same file without using any extra storage. The only storage increase comes with new or changed files, meaning a server can store vast numbers of “full” backups without wasting disk space.

Another great feature of BackupPC is that all backups are pulled from the server rather than being pushed from the client. This can be inconvenient for laptops that are shut off, but it means the server knows (and notifies) if it can't connect to machines in its backup list. Plus, it has an incredible Web interface that allows simple file restoration directly to the original location or downloadable via ZIP file. I love BackupPC, but it works well only over a LAN. If your house burns down, that doesn't help much.

Another favorite of mine is CrashPlan. It's a commercial, Java-based application that is so awesome, it's worth installing Java! CrashPlan does have a commercial cloud-based service that will keep your data safe on its servers. It's not free, but it is reasonable. The coolest part of CrashPlan, however, is that it allows backing up from one client to another—even from friend to friend. If you have friends who are willing to store your backups (offering to do the same for them often is a good motivator), you can have automatic, off-site backups for no cost whatsoever. Files can be restored fairly easily using the client software, and since backups are completely encrypted, your friends don't have access to the files they're storing for you.

Figure 3. CrashPlan is a commercial product, but it offers incredible free functionality.

If you don't have a friend willing to store your data, it might be worth it to ask someone to colocate a small server for you. For the cost of a cheap Raspberry Pi and an external USB drive, you can have a remote CrashPlan server logged in to your own account. It's even possible to limit the bandwidth and time of day you use, so your friend's Internet connection isn't adversely affected. Heck, if you can't find a friend, you might be able to put a small server in an outdoor shed, assuming you have Wi-Fi access and power. (It might be fun to create a solar-powered backup box for the backyard—perhaps that will be a project in the next couple months!)

Data Junkies, There's Hope!

I have a 48TB storage array in my basement. It's not full (yet), but it has many terabytes of data just sitting there waiting for something horrible to happen. And since it has happened to me before, I'm not blindly trusting my RAID 6 array for protection. The problem is, 48TB of storage is expensive, and that much cloud storage is even more expensive. I have a decent upload speed on my Internet connection, but paying for a place to store it is costly. I think the best solution would be to colocate a NAS device somewhere with fast bandwidth and a static IP. Unfortunately, I can't afford anything like that.

The cheapest storage solution I can find is Amazon Cloud Drive (amazon.com/clouddrive/unlimited). It surprised me, honestly, because although S3 storage is affordable, for large buckets, it's not exactly cheap. At the time of this writing, however, it's possible to buy Amazon Cloud Drive Unlimited for $60 per year. That might seem like a lot, but $5 a month is pretty cheap to store multiple terabytes of storage.

Amazon Cloud Drive isn't exactly Linux-friendly, but there are command-line tools for uploading files, and many sync programs support it. In my case, the Synology NAS I use (Figure 4) has full support for Amazon Cloud Drive, and it automatically mirrors my entire NAS. The unlimited storage appears to be the real deal, and I haven't had any issues so far with reaching limits. That said, I don't have 48TB stored yet, so it's possible I'll be contacted as my storage allocation increases.

Figure 4. My Synology 1815+ has eight drive bays populated with 6TB drives. The software is amazing, and it includes cloud syncing to Amazon Cloud Drive.

The Important Part

If you get nothing else from this article, please, make a backup of your important data. Even if it's just a live mirror copy, having your data in two places is so important in the increasingly digital world we live in, you just can't ignore the need.

I'm also very interested in hearing other backup ideas. If you have an affordable, free, innovative or unique way to back up files, please send me an e-mail at info@linuxjournal.com with “[BACKUP IDEA]” in the subject line.

In the meantime, I'll work on that solar-powered backup box for the backyard and be sure to write up a how-to so we can all sleep a little better at night!

Shawn Powers is the Associate Editor for Linux Journal. He's also the Gadget Guy for LinuxJournal.com, and he has an interesting collection of vintage Garfield coffee mugs. Don't let his silly hairdo fool you, he's a pretty ordinary guy and can be reached via e-mail at info@linuxjournal.com. Or, swing by the #linuxjournal IRC channel on Freenode.net.