Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HOWTO: The poor man's differential backup
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Tue Apr 26, 2011 9:26 am    Post subject: HOWTO: The poor man's differential backup Reply with quote

Hi.

Just wanted to share this. As I sought the Internet for backup solutions, especially about differential backups, I found lots of scripts and tools all over the place.

The context

I have a big drive that I must backup every day. No complicated solution is involved, just the good old tar command, it does wonders. Until a few days ago I was doing only full backups. Each archive file is named after the date of the backup, allowing me to go back in time for a certain amount of days. But the storage space on the backup drive is fixed and as the full backups grew in size I needed to decrease the history length. I ended up keeping only a few days and I wanted more. Another issue is the backup process takes so much time that it still runs during work hours.

Logically, I had to create full backups less frequently, outside work hours or during week-ends. Every other day, just do either differential or incremental backups.

A little history.

Differential backups are easier to manage than incremental: only restore the latest full backup then the latest differential backup. With incrementals, you need to restore *all* of them in sequence after the last full.

Typically differentials and incrementals do rely on the fact that a file was changed after the last backup. That's the purpose of the archive bit on Windows. Only full and incremental backups do reset this bit. The catch is — someone prove me wrong there — there's no such bit in GNU/Linux filesystems. But that's no big deal, really.

So the question is how do I create a differential backup?

There are scripts. There are tools. I want neither :D .

The [easy] solution

Of course, tar cannot create differential backups. But it can restore and skip newer files (tar -xp --keep-newer-files)! It's all we want, isn't it? Say we want to do full backups every week. All we need to do is find all files that have been changed since the last full backup. Simply put: we just need to find the files that have changed in the last 7 days. The restore process will take care not to overwrite the newer files, i.e. files in the full backup that are newer than those in the differential. Better more than not enough files in the archive, right?

So, in this scenario, here's the principle:

weekend.sh:
tar -cjvf full.tar.bz2

daily.sh:
find -ctime -7 -type f | tar -czvf differential.gz -T -  ...

Use find to select only files that were changed during the last 7 days, pipe the list through tar -czv -T - and you're done.

On the server machine I run this script, it takes several hours for a full but only minutes for a differential and the result takes gigs for a full while only a few megs for differentials. A simple cron will do: when date +%w equals 6 it is Saturday, time for a full backup otherwise it's a differential.

Code:
if [ $(date +%w) -eq 6 ]; then
   # Full backup
else
   # Differential backup, 7 days behind
fi

If I wanted full backups once a month, say the first of each month, I'd have tested

Code:
if [ $(date +%d) -eq 1 ]; then
   # Full backup
else
   # Differential backup, 31 days behind
fi

which would allow for an even longer period back in time for the same storage space.

Tips & tricks

I named all my backups after the date the script runs. Full backups are named $(date +%F).tar.bz2 and differentials are named $(date +%F).d.tar.gz . A full is bigger (noooo?) so I used bzip2. Diff's are much smaller so gzip is enough.

ls -l /media/backup:
...
-rw-r--r-- 1 root root 43596099590 mar 2 12:31 2011-03-02.tar.bz2
-rw-r--r-- 1 root root 43636684415 mar 3 12:33 2011-03-03.tar.bz2
-rw-r--r-- 1 root root 43636684415 mar 4 12:21 2011-03-04.tar.bz2
-rw-r--r-- 1 root root     4599518 mar 5  3:12 2011-03-05.d.tar.gz
-rw-r--r-- 1 root root 43636684415 mar 5 12:21 2011-03-05.tar.bz2

Files are sorted in natural order, you can immediately spot which of these is a full and which is not.

Simple. Neat. Standard.

Enjoy!
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!


Last edited by VinzC on Tue Apr 26, 2011 8:00 pm; edited 3 times in total
Back to top
View user's profile Send private message
x22
Apprentice
Apprentice


Joined: 24 Apr 2006
Posts: 208

PostPosted: Tue Apr 26, 2011 12:50 pm    Post subject: Re: HOWTO: The poor man's differential backup Reply with quote

VinzC wrote:

Of course, tar cannot create differential backups.

It can, using the -g option (5.2 Using tar to Perform Incremental Dumps and 5.3 Levels of Backups )
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Tue Apr 26, 2011 1:30 pm    Post subject: Re: HOWTO: The poor man's differential backup Reply with quote

VinzC wrote:
Of course, tar cannot create differential backups.

x22 wrote:
It can, using the -g option (5.2 Using tar to Perform Incremental Dumps and 5.3 Levels of Backups )

Indeed tar handles incremental backups, differential not. The difference is more practical rather than technical in that you need to restore *every* incremental in sequence since the last full backup, like 32 restores at most with a monthly full backup. With a differential, you need only two restores at most.

Here's an example:
  1. Full backup
  2. Change file A
  3. Backup 1
  4. Change file B
  5. Backup 2
With an incremental, backup 1 would copy file A and backup 2 only file B. With a differential, backup 1 would still copy file A but backup B would save both files A and B. So if you want to restore, you need to restore the full, plus *all* of the subsequent incremental archives. You only need the *last* differential though.
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
Sven Vermeulen
Retired Dev
Retired Dev


Joined: 29 Aug 2002
Posts: 1345
Location: Mechelen, Belgium

PostPosted: Tue Apr 26, 2011 9:28 pm    Post subject: Reply with quote

I use rsync with the --link-dest option. It allows you to do rsync's, but when files are already available at the link-dest location, it uses a hardlink rather than a copy, saving you the space of the files that aren't modified (although even hardlinks have file system impact).
_________________
Please add "[solved]" to the initial topic title when it is solved.
Back to top
View user's profile Send private message
x22
Apprentice
Apprentice


Joined: 24 Apr 2006
Posts: 208

PostPosted: Wed Apr 27, 2011 12:26 pm    Post subject: Re: HOWTO: The poor man's differential backup Reply with quote

VinzC wrote:

Indeed tar handles incremental backups, differential not.


It can be used for differential backups, too. It requires careful handling of the extra snapshot file which tar uses with -g option:
GNU tar manual wrote:
Notice that ‘/var/log/usr.snar’ will be updated with the new data, so if you plan to create more ‘level 1’ backups, it is necessary to create a working copy of the snapshot file before running tar.


5.3 Level of Backups describes the same strategy as in your original post:
GNU tar manual wrote:

A typical dump strategy would be to perform a full dump once a week, and a level one dump once a day. This means some versions of files will in fact be archived more than once, but this dump strategy makes it possible to restore a file system to within one day of accuracy by only extracting two archives—the last weekly (full) dump and the last daily (level one) dump. The only information lost would be in files changed or created since the last daily backup. (Doing dumps more than once a day is usually not worth the trouble.)
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10733
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Apr 27, 2011 1:38 pm    Post subject: Reply with quote

There's a neat Perl script that drives the traditional *nix archiving tools (find, tar, cpio, and their brethren) called flexbackup. It provides a management layer on top of those tools that creates full, incremental, or differential backups and supports a strong regular expression based exclusion mechanism.
Flexbackup Home Page wrote:
flexbackup is for you if you have a single or small number of machines, amanda is "too much", and tarring things up by hand isn't nearly enough...
It's in Portage: app-backup/flexbackup. Looks like it does exactly what you're doing plus handles a lot of the administrative tasks. Recommended. :)

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
depontius
Advocate
Advocate


Joined: 05 May 2004
Posts: 3526

PostPosted: Wed Apr 27, 2011 3:40 pm    Post subject: Re: HOWTO: The poor man's differential backup Reply with quote

VinzC wrote:

daily.sh:
find -ctime -7 -type f | tar -czvf differential.gz -T -  ...



I'd take another look at this, and think if you want to use "-ctime" or "-mtime". Most likely "-ctime "works pretty well because most applications don't generally update in-place - they manipulate the data in a newly-named file, then swap that for the original file. That changes the "file status", tripping the ctime for the file. If some application were to change the data in-place the ctime would not be updated, only the mtime would. I'm pretty sure that any ctime update also updates the mtime.
_________________
.sigs waste space and bandwidth
Back to top
View user's profile Send private message
XQYZ
Apprentice
Apprentice


Joined: 19 Jul 2009
Posts: 231
Location: Europe

PostPosted: Wed Apr 27, 2011 3:46 pm    Post subject: Reply with quote

Sven Vermeulen wrote:
I use rsync with the --link-dest option. It allows you to do rsync's, but when files are already available at the link-dest location, it uses a hardlink rather than a copy, saving you the space of the files that aren't modified (although even hardlinks have file system impact).


Same. I've actually blown this out of proportion by making it into a apple-time-machine-like backup solution last year: http://dump.domindthegap.co.uk/backup/ (backup is called via hourly cron, bjanitor is just a python script which cleans out old backups - manually so far). If only I ever found the time to finish it properly. Still missing a couple of features I'd like (not to mention the horrible python script - my first with more than 50 lines back in the days).

And yeah, hardlinks have quite an impact: Like 20 MB on my home directory :twisted: . But what's 20Mb nowadays with 2 TB drives for way under 100 euro/dollar.
Back to top
View user's profile Send private message
SlashBeast
Retired Dev
Retired Dev


Joined: 23 May 2006
Posts: 2922

PostPosted: Wed Apr 27, 2011 5:40 pm    Post subject: Reply with quote

you guys should check rsnapshot and rdiff-backup.
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Thu Apr 28, 2011 9:43 am    Post subject: Re: HOWTO: The poor man's differential backup Reply with quote

John R. Graham wrote:
There's a neat Perl script that drives the traditional *nix archiving tools (find, tar, cpio, and their brethren) called flexbackup. It provides a management layer on top of those tools that creates full, incremental, or differential backups and supports a strong regular expression based exclusion mechanism.
Flexbackup Home Page wrote:
flexbackup is for you if you have a single or small number of machines, amanda is "too much", and tarring things up by hand isn't nearly enough...
It's in Portage: app-backup/flexbackup. Looks like it does exactly what you're doing plus handles a lot of the administrative tasks. Recommended. :)

- John

Thank you very much, John. Will look at that.


x22 wrote:
5.3 Level of Backups describes the same strategy as in your original post:

GNU tar manual wrote:
A typical dump strategy would be to perform a full dump once a week, and a level one dump once a day. This means some versions of files will in fact be archived more than once, but this dump strategy makes it possible to restore a file system to within one day of accuracy by only extracting two archives—the last weekly (full) dump and the last daily (level one) dump. The only information lost would be in files changed or created since the last daily backup. (Doing dumps more than once a day is usually not worth the trouble.)

Thanks for clarifying. I hadn't understood it that way. The one thing that bothers me with that solution is you need to keep a trace file permanently, if I guessed it right. I preferred using no extra file, log or trace (well, of course, except the backup log that gets sent by email). That's where find comes handy.


SlashBeast wrote:
you guys should check rsnapshot and rdiff-backup.

Of course. But the main thing is I wanted no script, no tool, just tar [and find]. And most of all, a tar archive has the main advantage of being portable and you may copy your archive to any destination without losing security nor anything. You may of course tar a directory created by rdiffbackup but it's just one more [time and resource consuming] step.


VinzC wrote:
daily.sh:
find -ctime -7 -type f | tar -czvf differential.gz -T -  ...

depontius wrote:
I'd take another look at this, and think if you want to use "-ctime" or "-mtime". Most likely "-ctime "works pretty well because most applications don't generally update in-place - they manipulate the data in a newly-named file, then swap that for the original file. That changes the "file status", tripping the ctime for the file. If some application were to change the data in-place the ctime would not be updated, only the mtime would. I'm pretty sure that any ctime update also updates the mtime.

Thank you very much for the hint! Indeed I didn't spot that. The server on which the script runs is a Samba server. I have just tested the difference between both and it looks like find -mtime returns less results than with -ctime. I suppose I can combine both?
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum