How to create rsync-like hard link backups with VSS on Windows

The problem

Finding a good backup software for Windows is quite hard, if not next to impossible. But what makes a backup software good?

  • It should be fast
  • It should be able to backup open/locked files, so it will backup your outlook.pst even when Outlook is running (and it’ll also backup system files, the registry etc.)
  • It should not use a proprietary container format for its backups – I don’t want to “mount” a backup just to browse it
  • It should neither do classic full backups nor classic incremental/differential backups but instead work with hard links (more on that later)
  • It should be able to make backups that can be used to restore a completely broken system that won’t even boot up anymore
  • It should be free (or at least cheap)
  • It should be easy to set up
  • It should be able to run without any user interaction, because when you have to care about creating backups, you won’t do it
  • It should let you encrypt your backups

Now that’s a lot isn’t it?

See, I work with Windows, Mac OS X and Linux machines. Mac OS X ships with a backup program called Time Machine, and that program does ALL THE ABOVE. Well, prior to 10.7 (Lion), the encryption part wasn’t built in – you could fiddle with an encrypted image container but it wasn’t perfect. Yet, Time Machine was and is the very best backup software I have ever seen. It backups all your files on an hourly basis, it lets you browse your backups either with its GUI or within the file browser (Finder) and restore single files from any backup set, it only backups changed or new files, every backup set appears to be a full backup (using hard links), and if you manage to mess up your mac completely or your hard disk crashes, you can restore the Mac to one of the backupped states just using the backup drive. You don’t even need the OS DVD. And if there’s not enough space on your backup volume, it’ll delete the oldest snapshots until there’s enough space again.

Now, the linux users know there is a way to achieve most of these requirements using one of the best tools – rsync. Using its –link-dest flag, you can make the same hard link-based backups that Time Machine does. I use this for the servers I manage.

But many people work with Windows. I was looking for a good solution to backup Windows machines in a similar way, and it’s not easy. Usually you have two choices:

  • Use a disk imager to create complete images of your partition and/or hard disk. This way, you can restore the system if it won’t boot anymore. The downside: they use container formats and they can’t do the hard link-based backups – maybe they let you do a full backup and then incremental or differential backups. This is not what I want – if the backups eat up too much space and you want to remove the oldest backup sets, you have to do a new full backup, because both incremental and differential backups need a full backup to be based on.
  • Use a backup software that backups your files only or script something yourself, for example using RoboCopy. You then have the choice to either do full backups every time (time and space consuming) or to keep just one backup set and synchronize it to the current state of your files every time. That’s ok as a file based backup, but you can’t restore Windows from it, and you don’t have multiple backup sets. Say you delete a file, backup a few days later and then realize you need that file again. It’s gone from your PC and your backup. OR you go the synchronize way but don’t delete files in the backup. That way, it’ll grow and grow and one day, you need to delete the complete backup and do a new full backup.

Finally I found a way to do most of the things on Windows that Time Machine does on a Mac, using rsync for windows (cygwin based) and Volume Shadow Copy Service (VSS). I combined the tools and scripts from here and here and added some of my own scripting to suit my needs.

A note about hard links.

Let’s say you create a text file containing the words “hello there” and save it to c:\files\hello.txt, its contents are written to the disk somewhere (say position 10246) and an entry is added to the “table of contents” of the file system. That entry basically says “there is a file called hello.txt, it is in the folder c:\files and its contents reside on position 10246”. This reference from file name to the actual file is called a hard link. When you delete the file, it’s not actually deleted. Instead, the hard link is removed. The content is still on the hard disk, but the file system has no file name associated with it. Thus, it doesn’t know that the file is there. Whenever you delete a file, the hard link is removed, and if there are no other hard links that point to that content on position 10246, the space is deallocated. This means that sooner or later, other data will be written over it. So, try not to think of “deleting a file”, think of “unlinking a file” instead.

These file system entries don’t really need much space, and there can be multiple hard links for the same file. In the above example, you could create a second hard link to your file in, say, c:\test.txt. Both file descriptors (c:\test.txt and c:\files\hello.txt) now point to the very same content (“hello there” on position 10246). It appears to you as if you had two files with the same content, but it is the same file – just with different names. If you’d open c:\files\hello.txt, change the text to “hello there, my friend”, save it and close it and then open c:\test.txt, you’ll see the same text you just saved. Because it’s the same file, just with two names in different places. Cool huh?

OK, so we can create multiple links to the same file, how does that help?

There are two good things about hard links. First thing is, they need almost no space. Second, the file won’t be “deleted” as long as there is at least one hard link to it.

Hard link based backups do a full backup of your data first, say, to “f:\backups\1\“. In the second run, they will compare your files with the previous backup set.

  • every new or changed file will be copied from your hard disk to “f:\backups\2\“.
  • every file that didn’t change since the last backup will be hard linked to its file descriptor in the previous backup set. This is the trick. This way, you have two references (hard links), one in backup set 1, the other in backup set 2, that point to the same content on the backup disk. No space is wasted but it appears to you both backup sets are full backups, although the second set only contains new and changed files – and hard links for the rest. And the best thing: If you need to delete old backups some day because you’re running low on space (say you delete “f:\backups\1\“, you will only delete files that are not linked in the next backup set. Every other file will just have one hard link less, but still be there. No need to do a new full backup.

Brilliant, isn’t it?

OK, here’s what you need.

I bundled everything in a zip file you can download here. It contains:

  • the windows version of rsync and some cygwin DLLs so it can run
  • different versions of vshadow.exe for Windows XP, Vista 32/64 and 7 32/64 (found them here)
  • _start_backup.cmd script that checks for elevated privileges (Vista and above only) and then starts invoke.cmd
  • invoke.cmd starts the correct vshadow.exe depending on your OS
  • vss-exec.cmd – this is the main backup script. Here you need to adjust some settings, such as target drive and path
  • dosdev.exe and vss-setvars.cmd for the volume shadow snapshots to work
  • rsync-excludes.txt – you can put file and folder names here that you don’t want to be backed up. I put some Windows folders which I don’t need backed up into it, feel free to edit it. Be careful, rsync is case sensitive, so be sure to enter the paths with correct case.
  • deltree.cmd – Windows programs have a certain file name length limit. rsync is able to copy files and paths that exceed this limit. Unfortunately, you cannot delete these later with “rmdir” or “del” – you’ll get the error message “filename too long”. So, if an old snapshot is deleted (I use rmdir /q /s for that), I check if the snapshot directory is still there afterwards (because files with too long paths couldn’t be deleted) and if so, I have deltree.cmd remove the remaining files. Credits for this go to this forum thread.

For the backups, you’ll need an NTFS formatted external or internal hard disk or partition. Network shares may work but I didn’t test that.

What the script does:

  1. it creates a volume shadow copy snapshot so all open files are saved to a consistent state so they can be backed up
  2. delete the oldest snapshot(s) if there are more than you want to keep (set this in vss-exec.cmd)
  3. use rsync to copy all new and changed files to a new backup folder and create hard links for everything that hasn’t changed since the last backup
  4. release the volume shadow copy snapshot

On a typical Windows 7 x64 notebook (about 80GB used, two web browsers, MS Office, some programs, many photos and mp3 music files, iPhone backups from iTunes) and using an external USB hard disk with 1TB, the first (full) backup took about 3.5 hours and another backup immediately after took about 20 Minutes. In this scenario, about one third of the time was needed for c:\windows\winsxs and c:\windows\Installer. In my opinion, these folders do not need to be backed up, so I put them into the exclude list. Now things are a bit faster.

The original script looks pretty much like yours, what’s the difference?

The original script backed up to folders called \backupdir\0, \backupdir\1 and so on, where 0 is the most recent backup. On every backup, these folders were rotated, so if you chose to keep 10 snapshots, folder 10 would be deleted, folder 9 would be renamed to 10, 8 to 9 and so on. I didn’t like that idea. You can’t see at the first glance when a certain backup was made, if you wanted to have more snapshots you’d have to do lots of copy paste in the script and the rotation can be easily avoided. This is what I changed:

  • I made it so you just have to set the number of snapshots you want to keep (e.g. 10)
  • Backup folders are called “YYYY-MM-DD_HH.MM.SS” so you can easily see when the backups were made
  • OS detection for vshadow.exe – the correct version is determined automatically, based on your Windows version
  • I made rsync output progress of current file and some stats after the run
  • My script checks for elevated privileges on startup. In Vista and above, you need to start the backup script via right click -> run as administrator. I chose to do this because rsync runs with the privileges of the user it is run as. This means it cannot copy, for example, files of other users or other files you have no read access to. So, running it as admin is the best way to make sure everything you want backed up is backed up.

So how does that solution stand up to the requirements you posted above?

Well, let’s see.

  • Being fast? It is as fast as it possibly can, using VSS and rsync, that’s ok.
  • Being able to backup open/locked files? Check.
  • Not using a proprietary container format? Check.
  • Using hard links instead of the classic full/incremental/differential backup concept? Check.
  • Making backups that can be used to fully restore a broken system? Unfortunately not. At least, I don’t know a way to do it. But I doubt that copying back all files with a boot CD or so will work. Starting with Windows Vista, Microsoft started working with many hard links, NTFS junctions and stuff like that, partly for localisation and backwards compatibility (c:\Documents and Settings points to c:\Users to give one example) – this is not backed up. For this reason, you might want to exclude much more, like maybe the whole Windows directory, maybe c:\Program Files, as you cannot fully restore the system anyway. But who knows what it’s good for… if you do have some more Windows knowledge, you might want to have everything backed up in case a system file is being deleted or corrupted or something like that. Others just want a backup of their home directory and are fine with that.
  • Being free? Check.
  • Being easy to set up? Check (if opening a text file with a text editor is not too hard for you)
  • Being able to run without any user interaction? Check. You can use Windows’ Scheduled Tasks feature to have it back up periodically.
  • Encrypted backups? I don’t know. It should be possible to use NTFS encryption (or BitLocker in newer Windows versions), I didn’t test that though. Maybe a TrueCrypt container works too. If you try any of this, please let me know if it worked for you.

Feel free to leave comments.

License

Feel free to do with the script what you like. Edit, share, whatever. Just three things:

  • Don’t blame me if it doesn’t work or if it breaks something
  • If you modify and/or redistribute it, be nice and give the guy who did most of the work and me some credit
  • It must be free of any charge

 

17 Comments

  1. Backing up all the files requires you to run backups every day/week/month, right? For that we use GS RichCopy Enterprise (has an inbuilt scheduler) and features like copying long file path names, copying NTFS file permissions and is not too costly!

  2. Its 2017 now and soon going to be 2018, these utilities are obsolete now. I am now using GS Richcopy 360 which made my life easier by providing easy to use GUI and many features. I have done many file server migrations and it performed well in all of them.!

  3. Hello Jay2k1, thank you for yor how-to and the scipt. I’m a linux user and for Windows this is a very nice opportunity to create hardlink backups as I like them from the linux system.

    However, I hav a question about your script: Is it also possible to backup another drive than C:\? I have a network drive and want to backup it on the local hard disk C:\, so basically the other way round. Can the script adapted in this way?

  4. Hi Jay — Awesome job! I’m interested in adapting your backup script for my own use, but looks like the link is dead. Can you please make the .zip file available for download again?

    1. Oh, it seems I removed it without remembering I linked it here, my bad. The link is now working again. Thanks for the heads-up.

  5. Hey, great job! Is it possible to include just some directories, instead of the whole driver?

    1. Hi Tyron, yes this should be possible. In the script called vss-exec.cmd in lines 84 and 86, you’d have to edit the source paths for rsync. By default, it is /cygdrive/b/ which means drive B: (the volume shadow copy snapshot is assigned the drive letter B:). If you only want to backup your Pictures, you could just edit that source path so it would read /cygdrive/b/Users/YourName/Pictures/. Be careful, I think the path names are case sensitive here.

  6. I put together a set of scripts using primarily PowerShell along with the volume shadow copy utilities that you’re using…but I went the RoboCopy route. Although it works splendidly, my solution is very XP oriented and I never bothered to re-write it for Vista +.

    I’d like to make use of your scripts, but when I run it I end up with zero files in the backup directories. The directories get created, but they don’t contain any files at all. I notice that rsync is producing several “skipping non-regular file…” messages, and C:\Users\name\Documents are included in them. This is on a Windows 7 Enterprise machine.

    Do you have any ideas what the problem might be? Your scripts are sharp and I’d really like to use them if only it would work on my machine.

    I intend to test the “network drive” idea once I get them working.

    Thanks!
    Charlie

    1. Hi Charlie, without further information there is nothing I can say about this, sorry. I tested it on Windows 7 Ultimate x64 and it worked. Are you copying from and to a NTFS partition? Are you running the script with elevated privileges?

  7. Awesome.

    I’m in the process of adapting this to replace enterprise backup software I inherited. It comes with proprietary files that need proprietary software at restore time. Ugh!

    The only shortcoming I’ve found so far is that rsync doesn’t copy NTFS permissions. Not a complete deal-breaker, but it would certainly slow down a restore to servers with varied permissions.

    Have you found a way to tackle this! I’m investigating running a robocopy security-only pass after rsync to apply NTFS permissions to the destination, but I’m not entirely clear on whether security is applied to links or underlying files yet.

    1. To be honest, I haven’t checked how rsync behaves regarding security. Also, this was rather made in order to back up personal files in private use, so I didn’t really care about that. Using robocopy with a security-only pass seems to be a good idea, but I don’t know how it treats links. It should, however, deal correctly with them, as Windows makes use of different kinds of links since Vista (localized system folders are links to their english pendants). Regarding hard links, I assume that Windows/NTFS acts as *nix does: Permissions are bound to a file and not to (one of its) hard links. Test case: Create a file, create another hard link to it, change permissions on the file (using the original link), check permissions on the newly created hard link. They should be identical.

      I’m not sure if it’s a good idea to use this in enterprise environment. At least be sure to do lots of testing, not only backup but also restore. But of course I am glad you seem to like it

  8. Hello Hongel,

    hehe, you’re absolutely right. At the time of the writing, I was still beta testing the scripts and adding a few comments and stuff like that. After that, I totally forgot to upload the zip file

    Now the link is fixed.

    Oh and if you’re gonna try the script, feel free to let me know what you think about it.

  9. Hi Jay,

    Thanks for making this info available. I appreciate the effort you’ve gone to..!

    I’d like to try out your solution but unfortunately the link to download the ZIP is missing, although maybe I’ve misconfigured my browsers…? ;P

    “I bundled everything in a zip file you can download here.” <— alas here is no-w-here for me… ;(

    1. RSYNC for windows is a great tool do not get me wrong. The main problem with it is that it lacks the performance that is offered in today’s technology and the fact that it is not very intuitive to get it working properly.

Leave a Reply to Jay2k1 Cancel reply

Your email address will not be published. Required fields are marked *