I have a friend who is a photographer and she has a lot of large files as a result. Our current solution is where some files are delivered to my house. I don't have the storage capability to store all of her files.
Because we are talking Terabytes of data, a full backup can't be kept at her place. We do currently have backups of the post-processed files.
What I would like to do is keep a rolling, x-month backup of these files at her place. I’ve got space on a drive there, so I just need a script. This script needs to run at least daily and make sure that the 250GB of available space on this disk is filled with the most recent raw files such that the disk is nearly full.
I’ve been trying tar, as it has a built-in --newer option. However, then I’m just creating a massive tarfile every day. When I run the same job tomorrow, I’m creating another massive tarfile, which may well be exactly the same as yesterday's. And so on. It seems very inefficient.
My initial thought was rsync, but it doesn’t seem to have its own built in time options. There are ways you can faff about using a find command and piping in the results, but apparently this doesn’t preserve the directory structure at the target.
How is this not just “a thing” in Unix? Am I missing something? Ironically, this would be trivial in Windows: robocopy <source> <destination> /mir /maxage:<date>.
To summarise:
- We have a source tree full of large files
- At any time we may get a new folder of large files
- We have a hard drive which is not as big as the tree of files, but is big enough to contain the last 2 months’ of files
- I want those files copied, as frequently as I choose, to the drive
- The folder structure needs to be retained
- When a file becomes 2 months + 1 day old, delete it
- Net result: I always have the last 2 months’ of files on the drive, no matter what’s added to the source