Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Moving large home directory to new hdd
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Illiander
Apprentice
Apprentice


Joined: 22 Feb 2011
Posts: 252

PostPosted: Fri Nov 13, 2020 6:23 am    Post subject: Moving large home directory to new hdd Reply with quote

I use two hdds for my Gentoo system. One for /home, and one for everything else.

I've been physically moving the /home hdd from tower to tower as hardware dies, processors get upgraded, and so on.

This hdd is 2TB, and mostly full. It's also getting rather old.

What would be the best way to copy it all over to a new, larger hdd, without meaning that I can't use the computer for several days while a cp command runs on my home directory?

I have no real experience with linux backup and restore tools, so beginner-friendly advice would be appreciated.
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Fri Nov 13, 2020 8:47 am    Post subject: Re: Moving large home directory to new hdd Reply with quote

Illiander wrote:
What would be the best way to copy it all over to a new, larger hdd, without meaning that I can't use the computer for several days while a cp command runs on my home directory?

Rsync. Do it once to copy the bulk of the data, then again just before you swap drives to catch any changes. That second run should be very quick, minimising downtime.
Google will provide far better details than I, but IIRC I use something like 'rsync -aAXHh [source] [destination]' in my backup scripts. The relevant manual page is of course the go to for options.

Theoretically one should really do the second sync while nothing is using the source (e.g. logged out of the affected account and/or with the partition mounted ro), but in practice I've found that laziness wins, the time-window is small, and it's simpler to just keep the old drive around for a while in the unlikely event you need something off it.

Personally I'm far too paranoid to have ~2TB of even slightly important data residing only on a single drive, but that is of course entirely your call. If you do already have a backup of that drive, it'll do nicely as step 1 above. Just restore it to the new drive and rsync any updates.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.


Last edited by steve_v on Fri Nov 13, 2020 9:00 am; edited 1 time in total
Back to top
View user's profile Send private message
Illiander
Apprentice
Apprentice


Joined: 22 Feb 2011
Posts: 252

PostPosted: Fri Nov 13, 2020 8:59 am    Post subject: Re: Moving large home directory to new hdd Reply with quote

steve_v wrote:

Personally I'm far too paranoid to have ~2TB of even slightly important data residing only on a single drive


I had bad experiences of raid last time I tried it, but that was ~10 years ago.

What would you suggest as a raid-equivilent (or raid) to get decent data replication?

Something that is futureproof and easy to set up would be best for me.
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Fri Nov 13, 2020 9:09 am    Post subject: Re: Moving large home directory to new hdd Reply with quote

Illiander wrote:
I had bad experiences of raid last time I tried it, but that was ~10 years ago.
RAID5 by chance? So have I...

Illiander wrote:
What would you suggest as a raid-equivilent (or raid) to get decent data replication?
Ahh, another opportunity to plug ZFS. Excellent. :P
Despite being owned by a particularly obnoxious corporation and released under a GPL-incompatible (though still FOSS) licence, IMO ZFS is hands-down the best redundant storage solution at the moment. BTRFS is getting better, but both it's featureset and it's tools still suck by comparison.

Alternatively, if you're more after a hot backup than true redundancy, just use rsync, cron, and another disk.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54313
Location: 56N 3W

PostPosted: Fri Nov 13, 2020 10:02 am    Post subject: Reply with quote

steve_v,

Raid 5 has worked well here. My only bad experience was two drives being kicked out of raid5 15 min apart ...
With raid, do make sure that you have a spare HDD port. That allows the use of mdadm --replace
Running a raid set in degraded mode to resync is asking for trouble. That should be avoided if possible.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Fri Nov 13, 2020 10:28 am    Post subject: Reply with quote

NeddySeagoon wrote:
Raid 5 has worked well here. My only bad experience was two drives being kicked out of raid5 15 min apart

Assuming they just got kicked due to some transient glitch, that's fine. In my case I had 2 drives fail completely a few hours apart, resulting in loss of the array. IMO RAID5 just isn't sufficient redundancy with modern drive sizes and the rebuild time that implies... Either that or crappy drive models from a certain large manufacturer have made me paranoid.

In any case, I'm just not keen on running degraded while the array resyncs, which is exactly where you're at if your one drive worth of parity disappears off the bus.

NeddySeagoon wrote:
With raid, do make sure that you have a spare HDD port. That allows the use of mdadm --replace
Running a raid set in degraded mode to resync is asking for trouble. That should be avoided if possible.

Absolutely, but that's only a benefit if the drive you are replacing is still alive and still has mostly-usable data on it. A dead drive in a RAID5 doesn't provide any redundancy or take any of the rebuild load (and increased failure potential) off the remaining members, spare ports or no.
Even with a true hot-spare, RAID5 still has that zero-redundancy rebuild window if a member drive fails completely... Which is how about 30% of drive failures go IME.



Tangentially, and another shameless plug for ZFS: Traditional RAID5 provides very little protection against bit-rot and in-flight corruption. Even when RAID5 parity and data disagree, there's no reliable way to tell whether it's a data block or a parity block that is corrupt, and no reliable way to correct it unless you can do a "best of 3" election - i.e. double parity/RAID6. The best it can do is "this data might be bad, restore from backup". Hell, there's often no way to know if the data even landed on disk to begin with beyond blindly trusting the drive.
I personally have a bunch of very-slightly-corrupted files from back when I was running a RAID5 array to show for this.

ZFS wins as far as data integrity, but I'll not attempt to explain the mechanism as Oracle does it much better than I ever could.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.


Last edited by steve_v on Fri Nov 13, 2020 11:08 am; edited 1 time in total
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2014

PostPosted: Fri Nov 13, 2020 10:57 am    Post subject: Reply with quote

I moved my home from a 1TB RAID 5 setup (4x320 GB disks, with other stuff on them) to a 2.5 TB RAID10 setup (5x1TB disks). IMHO RAID10 is easier to handle when a disk dies. But in either case, the essential thing is to know that a disk has dropped from the array. I got caught about 3 times suddenly discovering a drive had dropped a week or so ago....
_________________
Greybeard
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Fri Nov 13, 2020 11:10 am    Post subject: Reply with quote

Goverp wrote:
IMHO RAID10 is easier to handle when a disk dies.
Agreed.

Goverp wrote:
the essential thing is to know that a disk has dropped from the array.
You know mdadm can send you mail, right? :P
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54313
Location: 56N 3W

PostPosted: Fri Nov 13, 2020 11:11 am    Post subject: Reply with quote

Goverp,

mdadm can email you on all sorts events. Choose how much spam you want.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Illiander
Apprentice
Apprentice


Joined: 22 Feb 2011
Posts: 252

PostPosted: Fri Nov 13, 2020 1:53 pm    Post subject: Reply with quote

I don't want to touch ZFS - I don't trust Oracle. I'm quite happy with extX filesystems.

Sounds like I want either a RAID1 or a RAID 10 setup, depending on how many hdds my case can fit?

How easy is it to set that up in software these days? My highest priority here is being able to plug my hdds into a new tower in the future and have everything just work, so I'm not going to touch hardware RAID.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54313
Location: 56N 3W

PostPosted: Fri Nov 13, 2020 2:04 pm    Post subject: Reply with quote

Illiander,

Consider raid6 too.
raid6 has two parity volumes. In a 4 spindle raid6 you get better redundency than a 4 spindle raid10.

Raid6 protects against any two drives failing.
Raid10 protects against any one drive failing and some some combinations of two drives.

Raid is never a backup.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Fri Nov 13, 2020 5:07 pm    Post subject: Reply with quote

Illiander wrote:
I don't want to touch ZFS - I don't trust Oracle.

Fair enough, I don't trust them either. For me the features are worth the risk (primarily Oracle going full patent-troll), and TBH I expect they'd have a hard time closing up ZFS even if they wanted to.

Illiander wrote:
Sounds like I want either a RAID1 or a RAID 10 setup, depending on how many hdds my case can fit?
RAID1/10 is certainly the simplest layout, and the most performant.

My take:
2 drives - RAID1
4 drives - RAID10
5+ drives - either RAID10 or RAID6, depending on how much raw performance you want and how much space you are willing to sacrifice to redundancy.

Illiander wrote:
How easy is it to set that up in software these days?
As easy as it has ever been. If you're just going for data drives it's about as easy as falling off a log, and making the boot/root partitions RAID as well isn't much more complicated (at least for "legacy" boot, I refuse to go near this UEFI/Secure boot crap).
It's pretty much partition disks, create array, format, mount. You can even skip step one if you're using whole disks.

NeddySeagoon wrote:

Consider raid6 too.
raid6 has two parity volumes. In a 4 spindle raid6 you get better redundency than a 4 spindle raid10.

Seconded, absolutely. Especially if more than 4 drives are involved. It's going to be a bit slower, but as well as better redundancy, usable storage space scales a whole lot better as well.

At home I run RAIDZ2 (aka ZFS RAID6) on 8 data spindles and 2 cache SSDs (data & VM zvols) with an SSD md-RAID1 for /boot & /root. The latter primarily because out-of-tree modules for a root filesystem is a headache I don't need.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
mrbassie
l33t
l33t


Joined: 31 May 2013
Posts: 781
Location: over here

PostPosted: Fri Nov 13, 2020 5:33 pm    Post subject: Reply with quote

steve_v wrote:
Illiander wrote:
I don't want to touch ZFS - I don't trust Oracle.

Fair enough, I don't trust them either. For me the features are worth the risk (primarily Oracle going full patent-troll), and TBH I expect they'd have a hard time closing up ZFS even if they wanted to.


Interesting.

What could they theoretically do?
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Fri Nov 13, 2020 5:34 pm    Post subject: Reply with quote

I have zero experience with RAID. Before I did anything I would back up that drive to a USB external drive. Now is the month to look for those Black Friday deals. Many merchants are doing them month long. Comparison shop because there is COVID price gouging out there. I'd buy a 4TB or larger for backing up that drive. Even after you get the RAID set up backup monthly with rsync.
If something fries your computer you have a non-attached backup.
Back to top
View user's profile Send private message
Illiander
Apprentice
Apprentice


Joined: 22 Feb 2011
Posts: 252

PostPosted: Sat Nov 14, 2020 2:18 am    Post subject: Reply with quote

NeddySeagoon wrote:
Raid is never a backup.


Could you elaborate on this?

I'm in the catagory of "has never had a hard drive fail", so for personal stuff, I've never had to consider backups seriously.
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sat Nov 14, 2020 3:26 am    Post subject: Reply with quote

First place I looked WD Elements 4 TB USB 3.0 $90 USD free shipping. A search would probably find more. Is all that data worth $90? About a $1.50 per GB, a fraction of a penny per MB?
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 21722

PostPosted: Sat Nov 14, 2020 5:34 am    Post subject: Reply with quote

RAID is considered not to be a backup because, by design, it keeps all its copies of your data completely current. If you have a system upgrade go badly, or a process deletes too many files, RAID will replicate those problems across all your drives as they happen, so you cannot use the RAID to recover from the problem, even if you realize the problem moments after it happens. By contrast, a backup occurs on a schedule and provides specific points in time that you can restore files from. If you have a data loss event, but the lost data is recorded in a backup, and you recognize the loss before the backup is retired, then you can use the backup to retrieve the lost data.

RAID is good for addressing the problem it is designed to address: loss of some portion of the array will not immediately render all the data inaccessible, as long as your losses are within what your RAID level is designed to handle.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54313
Location: 56N 3W

PostPosted: Sat Nov 14, 2020 10:01 am    Post subject: Reply with quote

Illiander,

No irreplaceable data on your system?
What you back up depends on the data loss you are willing to tolerate and the time to get going again.

If you make an image of your system, you restore it from the image to the date/time of the image.
You need not back up any more than /home if you only want user data.
Then you have to bring up a replacement from scratch.

You only have a backup if you have at least two copies. With only one copy, when your PC fails, you no longer have a backup.
Backups need to be validated too. The wrong time to find out that your backups are useless are when you need them most.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2014

PostPosted: Sat Nov 14, 2020 11:21 am    Post subject: Reply with quote

steve_v wrote:
...You know mdadm can send you mail, right? :P
yes, but (a) you need tha correctly-configured MDADM monitor running, (b) a working email system to get the message to you and (c) to receive and notice the email before anything else goes wrong. RAID array status is one of those things I'd prefer in a system dashboard, if ever I got around to setting one up.
_________________
Greybeard
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54313
Location: 56N 3W

PostPosted: Sat Nov 14, 2020 12:23 pm    Post subject: Reply with quote

Goverp,

Its not that easy. mdadm only tells of events it knows about.
Code:
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md3 : active raid5 sdb4[7] sda4[5] sdc4[6] sdd4[4]
      11456987136 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/29 pages [0KB], 65536KB chunk

md2 : active raid5 sdb3[7] sda3[5] sdc3[6] sdd3[4]
      222461952 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
     
md1 : active raid5 sdb2[7] sda2[5] sdc2[6] sdd2[4]
      40802304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
     
md0 : active raid1 sdb1[0] sda1[3] sdc1[2] sdd1[1]
      61376 blocks [4/4] [UUUU]
     
unused devices: <none>
does not mean all is well, just that there are no problems encountered in use yet.

Code:
echo check > /sys/block/md3/md/sync_action
followed by
Code:
# cat /sys/block/md3/md/mismatch_cnt       
0
checks the entire raid for self consistency. 0 is the right answer here.

Code:
echo repair ...
may be useful if you get a non zero result but check for a non zero pending sector count in smartctl too.
You can do it all in a cron job but I run the check before each update, which is every 4 weeks or so.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Sat Nov 14, 2020 4:55 pm    Post subject: Reply with quote

mrbassie wrote:
What could they theoretically do?

Dunno... Which is kinda the concern.

Hu wrote:
RAID is considered not to be a backup because, by design, it keeps all its copies of your data completely current.

Another win for ZFS then: not only do you get RAID functionality, you get atomic snapshots and network replication too. :P

But yeah, a real backup is on another machine, in another building.

Goverp wrote:
RAID array status is one of those things I'd prefer in a system dashboard

Easily done with conky, kdialog, notify-send, or any of the multitude of desktop applet solutions... But only useful if you're sitting in front of the thing.
I prefer email, because most of my storage is in headless machines. (a) it's well documented (b) dma, ssmtp or nullmailer if you don't have a full mail setup (c) my phone goes "ding" and I get a desktop "new mail" notification as well. :P

NeddySeagoon wrote:
mdadm only tells of events it knows about.

I find a check as a weekly cron job + mdadm in daemon mode does a pretty good job of notifying about anything that matters.

NeddySeagoon wrote:
check for a non zero pending sector count in smartctl too.

Smartmontools also has a daemon mode and email capability. The biggest challenge is getting it to not notify you about trivialities.
It's nagging me about SSD lifetime on one of my cache drives right now. I'm totally getting around to sorting that. ;)
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54313
Location: 56N 3W

PostPosted: Sat Nov 14, 2020 5:27 pm    Post subject: Reply with quote

A snapshot on the same drives is not a backup. Every now and again, ZFS behaves as /dev/null
Creating a snapshot then making an copy of the snapshot works.
LVM supports snapshots too, so its filesystem independent.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
steve_v
Guru
Guru


Joined: 20 Jun 2004
Posts: 388
Location: New Zealand

PostPosted: Sat Nov 14, 2020 5:36 pm    Post subject: Reply with quote

NeddySeagoon wrote:
Every now and again, ZFS behaves as /dev/null

I really have no idea what you're talking about there.

NeddySeagoon wrote:
Creating a snapshot then making an copy of the snapshot works.

Absolutely, taking a snapshot and sending it to another pool is how backups with ZFS generally work.

NeddySeagoon wrote:
LVM supports snapshots too, so its filesystem independent.

It does, and LVM on mdraid is a perfectly reasonable solution.

Having raid+snapshots+replication all in the same layer does have it's advantages though, such as the ability to send incremental snapshots to another pool over a network without involving additional tools like rsnapshot.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
sitquietly
Tux's lil' helper
Tux's lil' helper


Joined: 23 Oct 2010
Posts: 143
Location: On the Wolf River, Tennessee

PostPosted: Sat Nov 14, 2020 7:14 pm    Post subject: Reply with quote

Illiander wrote:
I don't want to touch ZFS - I don't trust Oracle. I'm quite happy with extX filesystems.

Sounds like I want either a RAID1 or a RAID 10 setup, depending on how many hdds my case can fit?

How easy is it to set that up in software these days? My highest priority here is being able to plug my hdds into a new tower in the future and have everything just work, so I'm not going to touch hardware RAID.


Oracle has nothing to do with it, the ZFS system created by Sun Microsystems for OpenSolaris was open-sourced by Sun. The open-source ZFS, now called OpenZFS is the only one that is used by Linux and will be the only ZFS used by FreeBSD as of Release 13. I run ZFS for my "RAID10" with four 4TB drives. I have multiple boot drives with FreeBSD, Gentoo, and Debian and each OS uses the same 8TB ZFS pool. The ability to use the same array with both Linux and FreeBSD is a great advantage of ZFS for me, but also it is very easy to work with and nothing is more trustworthy for keeping my huge data collection safe.

You can create a zfs pool on bare drives with a simple
Code:
zpool create mydatatank mirror <disk1> <disk2> mirror <disk3> <disk4>

and now you can create filesystems as needed, make those filesystems compressed, make a fileystem encrypted, set/change mountpoints.

If you can only fit 3 hdd in your case then you could use zfs raidz and get 8TB of storage on 12TB of raw drive space. Or with 4 hdd it is best to use two mirrors (raid10) for optimum read AND write speeds, and less stress on the drives when one fails.

Side note: I like the Seagate Terascale Enterprise drives. I get "renewed" drives from Amazon for as little as $55 / 4TB. They perform great for me, very quiet, about 150 MB/s on the raw drive -- my zfs array runs at near ssd speeds for sequential i/o.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum