Finding a good backup software for Windows is quite hard, if not next to impossible. But what makes a backup software good?
- It should be fast
- It should be able to backup open/locked files, so it will backup your outlook.pst even when Outlook is running (and it’ll also backup system files, the registry etc.)
- It should not use a proprietary container format for its backups – I don’t want to “mount” a backup just to browse it
- It should neither do classic full backups nor classic incremental/differential backups but instead work with hard links (more on that later)
- It should be able to make backups that can be used to restore a completely broken system that won’t even boot up anymore
- It should be free (or at least cheap)
- It should be easy to set up
- It should be able to run without any user interaction, because when you have to care about creating backups, you won’t do it
- It should let you encrypt your backups
Now that’s a lot isn’t it?
See, I work with Windows, Mac OS X and Linux machines. Mac OS X ships with a backup program called Time Machine, and that program does ALL THE ABOVE. Well, prior to 10.7 (Lion), the encryption part wasn’t built in – you could fiddle with an encrypted image container but it wasn’t perfect. Yet, Time Machine was and is the very best backup software I have ever seen. It backups all your files on an hourly basis, it lets you browse your backups either with its GUI or within the file browser (Finder) and restore single files from any backup set, it only backups changed or new files, every backup set appears to be a full backup (using hard links), and if you manage to mess up your mac completely or your hard disk crashes, you can restore the Mac to one of the backupped states just using the backup drive. You don’t even need the OS DVD. And if there’s not enough space on your backup volume, it’ll delete the oldest snapshots until there’s enough space again.
Now, the linux users know there is a way to achieve most of these requirements using one of the best tools – rsync. Using its –link-dest flag, you can make the same hard link-based backups that Time Machine does. I use this for the servers I manage.
But many people work with Windows. I was looking for a good solution to backup Windows machines in a similar way, and it’s not easy. Usually you have two choices:
- Use a disk imager to create complete images of your partition and/or hard disk. This way, you can restore the system if it won’t boot anymore. The downside: they use container formats and they can’t do the hard link-based backups – maybe they let you do a full backup and then incremental or differential backups. This is not what I want – if the backups eat up too much space and you want to remove the oldest backup sets, you have to do a new full backup, because both incremental and differential backups need a full backup to be based on.
- Use a backup software that backups your files only or script something yourself, for example using RoboCopy. You then have the choice to either do full backups every time (time and space consuming) or to keep just one backup set and synchronize it to the current state of your files every time. That’s ok as a file based backup, but you can’t restore Windows from it, and you don’t have multiple backup sets. Say you delete a file, backup a few days later and then realize you need that file again. It’s gone from your PC and your backup. OR you go the synchronize way but don’t delete files in the backup. That way, it’ll grow and grow and one day, you need to delete the complete backup and do a new full backup.
Finally I found a way to do most of the things on Windows that Time Machine does on a Mac, using rsync for windows (cygwin based) and Volume Shadow Copy Service (VSS). I combined the tools and scripts from here and here and added some of my own scripting to suit my needs.
A note about hard links.
Let’s say you create a text file containing the words “hello there” and save it to c:\files\hello.txt, its contents are written to the disk somewhere (say position 10246) and an entry is added to the “table of contents” of the file system. That entry basically says “there is a file called hello.txt, it is in the folder c:\files and its contents reside on position 10246”. This reference from file name to the actual file is called a hard link. When you delete the file, it’s not actually deleted. Instead, the hard link is removed. The content is still on the hard disk, but the file system has no file name associated with it. Thus, it doesn’t know that the file is there. Whenever you delete a file, the hard link is removed, and if there are no other hard links that point to that content on position 10246, the space is deallocated. This means that sooner or later, other data will be written over it. So, try not to think of “deleting a file”, think of “unlinking a file” instead.
These file system entries don’t really need much space, and there can be multiple hard links for the same file. In the above example, you could create a second hard link to your file in, say, c:\test.txt. Both file descriptors (c:\test.txt and c:\files\hello.txt) now point to the very same content (“hello there” on position 10246). It appears to you as if you had two files with the same content, but it is the same file – just with different names. If you’d open c:\files\hello.txt, change the text to “hello there, my friend”, save it and close it and then open c:\test.txt, you’ll see the same text you just saved. Because it’s the same file, just with two names in different places. Cool huh?
OK, so we can create multiple links to the same file, how does that help?
There are two good things about hard links. First thing is, they need almost no space. Second, the file won’t be “deleted” as long as there is at least one hard link to it.
Hard link based backups do a full backup of your data first, say, to “f:\backups\1\“. In the second run, they will compare your files with the previous backup set.
- every new or changed file will be copied from your hard disk to “f:\backups\2\“.
- every file that didn’t change since the last backup will be hard linked to its file descriptor in the previous backup set. This is the trick. This way, you have two references (hard links), one in backup set 1, the other in backup set 2, that point to the same content on the backup disk. No space is wasted but it appears to you both backup sets are full backups, although the second set only contains new and changed files – and hard links for the rest. And the best thing: If you need to delete old backups some day because you’re running low on space (say you delete “f:\backups\1\“, you will only delete files that are not linked in the next backup set. Every other file will just have one hard link less, but still be there. No need to do a new full backup.
Brilliant, isn’t it?
OK, here’s what you need.
I bundled everything in a zip file you can download here. It contains:
- the windows version of rsync and some cygwin DLLs so it can run
- different versions of vshadow.exe for Windows XP, Vista 32/64 and 7 32/64 (found them here)
- _start_backup.cmd script that checks for elevated privileges (Vista and above only) and then starts invoke.cmd
- invoke.cmd starts the correct vshadow.exe depending on your OS
- vss-exec.cmd – this is the main backup script. Here you need to adjust some settings, such as target drive and path
- dosdev.exe and vss-setvars.cmd for the volume shadow snapshots to work
- rsync-excludes.txt – you can put file and folder names here that you don’t want to be backed up. I put some Windows folders which I don’t need backed up into it, feel free to edit it. Be careful, rsync is case sensitive, so be sure to enter the paths with correct case.
- deltree.cmd – Windows programs have a certain file name length limit. rsync is able to copy files and paths that exceed this limit. Unfortunately, you cannot delete these later with “rmdir” or “del” – you’ll get the error message “filename too long”. So, if an old snapshot is deleted (I use rmdir /q /s for that), I check if the snapshot directory is still there afterwards (because files with too long paths couldn’t be deleted) and if so, I have deltree.cmd remove the remaining files. Credits for this go to this forum thread.
For the backups, you’ll need an NTFS formatted external or internal hard disk or partition. Network shares may work but I didn’t test that.
What the script does:
- it creates a volume shadow copy snapshot so all open files are saved to a consistent state so they can be backed up
- delete the oldest snapshot(s) if there are more than you want to keep (set this in vss-exec.cmd)
- use rsync to copy all new and changed files to a new backup folder and create hard links for everything that hasn’t changed since the last backup
- release the volume shadow copy snapshot
On a typical Windows 7 x64 notebook (about 80GB used, two web browsers, MS Office, some programs, many photos and mp3 music files, iPhone backups from iTunes) and using an external USB hard disk with 1TB, the first (full) backup took about 3.5 hours and another backup immediately after took about 20 Minutes. In this scenario, about one third of the time was needed for c:\windows\winsxs and c:\windows\Installer. In my opinion, these folders do not need to be backed up, so I put them into the exclude list. Now things are a bit faster.
The original script looks pretty much like yours, what’s the difference?
The original script backed up to folders called \backupdir\0, \backupdir\1 and so on, where 0 is the most recent backup. On every backup, these folders were rotated, so if you chose to keep 10 snapshots, folder 10 would be deleted, folder 9 would be renamed to 10, 8 to 9 and so on. I didn’t like that idea. You can’t see at the first glance when a certain backup was made, if you wanted to have more snapshots you’d have to do lots of copy paste in the script and the rotation can be easily avoided. This is what I changed:
- I made it so you just have to set the number of snapshots you want to keep (e.g. 10)
- Backup folders are called “YYYY-MM-DD_HH.MM.SS” so you can easily see when the backups were made
- OS detection for vshadow.exe – the correct version is determined automatically, based on your Windows version
- I made rsync output progress of current file and some stats after the run
- My script checks for elevated privileges on startup. In Vista and above, you need to start the backup script via right click -> run as administrator. I chose to do this because rsync runs with the privileges of the user it is run as. This means it cannot copy, for example, files of other users or other files you have no read access to. So, running it as admin is the best way to make sure everything you want backed up is backed up.
So how does that solution stand up to the requirements you posted above?
Well, let’s see.
- Being fast? It is as fast as it possibly can, using VSS and rsync, that’s ok.
- Being able to backup open/locked files? Check.
- Not using a proprietary container format? Check.
- Using hard links instead of the classic full/incremental/differential backup concept? Check.
- Making backups that can be used to fully restore a broken system? Unfortunately not. At least, I don’t know a way to do it. But I doubt that copying back all files with a boot CD or so will work. Starting with Windows Vista, Microsoft started working with many hard links, NTFS junctions and stuff like that, partly for localisation and backwards compatibility (c:\Documents and Settings points to c:\Users to give one example) – this is not backed up. For this reason, you might want to exclude much more, like maybe the whole Windows directory, maybe c:\Program Files, as you cannot fully restore the system anyway. But who knows what it’s good for… if you do have some more Windows knowledge, you might want to have everything backed up in case a system file is being deleted or corrupted or something like that. Others just want a backup of their home directory and are fine with that.
- Being free? Check.
- Being easy to set up? Check (if opening a text file with a text editor is not too hard for you)
- Being able to run without any user interaction? Check. You can use Windows’ Scheduled Tasks feature to have it back up periodically.
- Encrypted backups? I don’t know. It should be possible to use NTFS encryption (or BitLocker in newer Windows versions), I didn’t test that though. Maybe a TrueCrypt container works too. If you try any of this, please let me know if it worked for you.
Feel free to leave comments.
Feel free to do with the script what you like. Edit, share, whatever. Just three things:
- Don’t blame me if it doesn’t work or if it breaks something
- If you modify and/or redistribute it, be nice and give the guy who did most of the work and me some credit
- It must be free of any charge