Since I’m hosting this website on my own personal home computer, I have the “opportunity” of worrying about my backup strategy. It won’t be the end of the world if something goes wrong, I have other documents that are far more important. But it’s nice to feel that you have accomplished something that makes you feel safe.

What to backup

  • I need to backup my subversion repositories. I use subversion as a VCS even for my pet projects. I think that using a VCS even at home is a good idea because you learn to live with it. If you’re not doing it already, you should consider it.
  • I need to backup my web sites. Well that goes without saying. It’s a web server so the web content should be backed up somewhere. The main web site that runs Cuyahoga also has a MSSQL database. The - up to this moment - two blogs are using BlogEngine and that doesn’t need a database at this point.

Where to backup

I bought an external hard drive to use as a backup medium. Unfortunately it seems that these days when you buy an external hard drive you also buy the software that “manages” the hard drive. I gave the backup software that came bundled with the hard drive a fair chance but it simply doesn’t work for me. Putting aside that it reminds me strongly of the early driver utilities that had to look all multimedia-like, it fails with a lot of pending files when a file is locked for one reason or the other. Also, it had some other lock issues working together with my antivirus. So I decided to keep the drive and lose the software. A clean format solved that.

How to backup

For subversion, I’m doing a dump of the subversion repositories using “svnadmin dump” and I backup the dump files. I could have just backed-up the repositories in-place, if no locking issues would arise. However, I’m relying on the dump files also to overcome possible versioning issues between different versions of subversion itself. I’ve had this problem in the past when I moved my repositories from a later version of subversion to an earlier and I don’t want to go through that again. So I think that the dump files are the safest option.

For the MSSQL database, I created the backup script through Management Studio Express. I think that if I had the full (i.e. non express) version, scheduling the backup would be simpler. But generating the script and setting up a scheduled task with “sqlcmd -i” is also quite simple.

Finally, for the web sites, I’m doing a “xcopy /D /E” into a target folder, which I backup. The “/E” switch handles recursion in directories and the “/D” switch only copies files with newer datetime than the ones in the backup. Of course this has the limitation of not being able to delete files from the backup that were deleted from the source.

All those three steps are set as scheduled tasks in the standard Windows Task Scheduler. I’ve set them up to run every Monday morning. I don’t keep older versions of the backups; data is updated so infrequently that doesn’t justify it. However, for an extra step of redundancy, I wrote a script on my Mac that duplicates the backed up files from the web server into the MacBook drive. That eventually gets also backed up via Time Machine, so you could say that I do have some sense of versioning in my backups.

P2P Backup

On something different, I was thinking the other day about the possibility of backing up data on a remote physical location. Doing a little bit of Googling shows that I’m not the first person to think about it, but I didn’t find any nice and free solutions for that. With broadband connections and large hard drives being cheap, why aren’t we already backing up our family photos across all the computers that a family owns?