3

I heavily use virtual machines in my work and I need an efficient way of synchronizing them between my PCs.

I know of bigsync, but this program works well only for one-way synchronizing (like rsync).

Unison is also insufficient, because it always copies target files before updating them, which is unacceptable for >16GB disk images.

I need a tool or a script for two-way synchronization in place, something that will work like bigsync except it will first test which file is more recent by inspecting the modification date and whether the contents changed after last use (btrfs filesystem has nice features to test that), and - unlike bigsync will work efficiently both ways (AFAIK bigsync works efficiently only one way: from source to destination. For the other way, one needs to call it from the remote end).

14
  • why dont you use NFS and have it on a fileserver? you can store the file in a centralized location and mount it on more than one host. then you don't have to synchronize anything. Commented Feb 11, 2014 at 10:54
  • 1
    @son_of_fire I don't always work online. And even if I did, the wifi is unreliable, so NFS would hardly a medium for backing storage for virtual machine. Commented Feb 11, 2014 at 11:26
  • @AdamRyczkowski why not use a versionning software like git or mercurial that also work offline ? and you can choose exactly what you want to keep and even having a history of file Commented Feb 11, 2014 at 13:39
  • 1
    Are you asking how to keep the VM images in sync or the contents of something inside the VM. These are 2 very different problems. Commented Feb 11, 2014 at 14:07
  • @Kiwy Are you sure, that git work on parts of the file? Maybe things changed, but few years ago, the smallest object the git could hangle was a whole file. Git also stores history, which I don't need (VirtualBox already has snapshots). If I used git, I would fill 200GB of storage each month, if I ever waited that long to have that amount of data copied. Commented Feb 11, 2014 at 14:21

1 Answer 1

3

Because of @Kiwy's persistence that you could use Git to do this in the comments, it reminded me of a tool that I'd seen a while ago called git-annex. In refreshing myself on what git-annex can do I remembered coming across this post in the git-annex forums.

Synchronize large files (VM images)

Hi,

I'm thinking to use git-annex to synchronize my virtual machine directory (Virtualbox) between 3 pc. It's quite big: more than 200GB and some of the images are 40Gb in size.

The synchronization will be over a lan (obviously). It is already in place with 2pc and unison but the configuration of the 3rd pc is cumbersome. Does anybody have experiences with git-annex and such amount of data?

Thanks in advance

Gabriele

To which the author of git-annex replied:

This volume of data should be no problem for git-annex.

The only catch would be if you're running those VM images and want to sync them as they're changed. With git-annex, you'd need to git annex unlock a file to allow it to be modified, and then git annex add it back and commit changes made to it.

So it's just Git?

But be clear on this point. Git-annex is not pure Git. It uses the interface that git provides but uses a variety of different backends for doing the actual shuttling of data back and forth. Read the "How it works" page for more on this.

The contents of 'annexed' files are not stored in git, only the names of the files and some other metadata remain there.

For more on how it handles the "transferring of data" take a look at this section of the site titled: "transferring data.

Special remotes

The genius in git-annex's approach is in the "special remotes". This allows the backends to be essentially plugged in and are therefore modular in nature. You can see a full list of the various "special remotes" here.

References

9
  • I'm glad it remind you this :D also I was pretty sure the virutalBox was saving binary difference between the original disk and the snapshot in separate file (buit that's maybe from a while ago). Commented Feb 11, 2014 at 14:47
  • @Kiwy - VB does save the differences in files, Docker and LXC too. See this diagram for what I was talking about wrt Docker: docker.io/static/img/about/modifying_updating.jpg Commented Feb 11, 2014 at 14:53
  • 1
    As an alternative to git-annex there's another tool called bup github.com/bup/bup, which can be used to do similar things, it too hinges off of Git's API, "...Capable of doing fast incremental backups of virtual machine images...." Commented Feb 11, 2014 at 15:02
  • 1
    Yes. bup seems a way to go - it is designed specifically for my purpose. Commented Feb 11, 2014 at 15:13
  • 1
    bup has apparently resolved the issue with garbage collection of old (deleted) backups (github.com/bup/bup/commit/…) Commented May 20, 2016 at 7:35

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.