View previous topic :: View next topic |
Author |
Message |
c00l.wave Apprentice
Joined: 24 Aug 2003 Posts: 268
|
Posted: Sat Mar 18, 2023 6:16 pm Post subject: best options for reliable long-term storage in 2023? |
|
|
I'm running out of disk space on my main machine and plan to move "long-term static data" to my "NAS"/home server (also running on Gentoo). So far I've been happily using partition-wise mdraid 1 + LVM + ext4 on both machines for many, many years. However, especially since ZFS - at least at first glance - appears to be stable on Linux nowadays, I wanted to check if maybe there's a better way of safely storing my data for long-term storage. The data I need to relocate is of high personal value, such as photos and videos or other typical "archive" data, so some way of automatic recognition and repair of (partially?) corrupted files would be great - even with a backup I would have to notice that I actually had data corruption before I could try to restore a file.
I guess I need to explain the partition-wise setup... Due to the (very) large size of modern drives I do not want to commit a full 8+ TB to a single RAID or filesystem, so I split it up into something like 2TB partitions that are then set up for mdraid and/or LVM for a few reasons:
- I do not need all that space immediately but may have some use-cases for non-RAID storage at a later point. That's easy to achieve by this method as I can simply remove an unused partition and reformat it.
- HDD read errors I've experienced so far were usually limited to just a small physical area of the disks - of course I would replace the entire HDD if it starts failing that way but IMO adding some sort of "independent" smaller partitions could make it easier to keep the good parts synced (the drive only degrades partially which could increase the chances to copy to a new drive if further corruption gets detected on the other drive - usually that's the ultimate error case for RAID 1).
- In case of some kernel panics, sudden power loss or other non-fatal events I've ended up with degraded RAIDs a bit too often. Limiting the size of RAID partitions means I may have less data to resync; other RAID partitions may remain synced (I would still scrub them after such events) or at least can be recovered quicker.
Regarding both options (current stack or ZFS) I have a few concerns:
- mdraid/LVM/ext4: Over the years I've occasionally come across 0 byte files (which is obvious) or suspiciously small JPEG files that, when opened, only partially load - meaning that at some point I actually had some sort of silent, unnoticed partial file corruption. That's mainly on older files which date back to a time when the data was not stored on a RAID or on other file systems (NTFS or ReiserFS) but I'm also not 100% confident that it really did not happen while already on my current setup. In some cases I'm actually pretty sure those files were always stored on the current setup - but I may have had to copy the files between filesystems when I installed new hard drives or repartitioned them. Even when doing semi-regular mdraid scrubbing (my PCs do not run 24/7 so I cannot schedule it) those corruptions may not be detected (mdraid requires the drive to issue a read error in order to repair a block). If I wanted some file-based check I would have to use some additional tool.
- ZFS: The original ZFS always sounded great, among other things it would actually be capable of doing the file-based check & repair I would like to have. Unfortunately, due to license incompatibility, it cannot be integrated directly to the mainline kernel. Since it comes from a different OS (Solaris, originally) it also requires a port to be used on Linux, so even if it started as a verbatim port, ZFS on Linux has probably diverged with optimizations over time (also, weren't there at least two different forks?). I know that, in general, ZFS is typically used on BSD-based NAS systems - but is the Linux version really equally stable and usable? Having had one incident of complete data loss on ReiserFS which also used loose B* structures I'm also a bit reluctant to again switch to a file system that does not use super blocks like ext4 (which I was already able to restore files with once). In case that a ZFS file system fails in a way that prevents normal traversal, what options are there for recovery (apart from just abandoning the corrupted filesystem and instead restoring an external backup)? If I were to switch to ZFS I would still like to keep it on smaller partitions (without mdraid, of course), like the partition-wise mdraid + LVM in my current setup, for the reasons I explained above. However, I'm unsure if that's practical with ZFS - usually people seem to just add the entire disks, not just (adjacent) partitions? Does ZFS correctly store copies over multiple disks or may I end up with both copies being present on the same disk, just different partitions?
Is there any other option I missed? btrfs still doesn't sound like it would fit be a good fit for such archive use-cases; ZFS seems like the better option to me in terms of reliability?
What would you recommend? Is ZFS as reliable as it seems, even on Linux? _________________ nohup nice -n -20 cp /dev/urandom /dev/null & |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1248 Location: Richmond Hill, Canada
|
Posted: Sat Mar 18, 2023 7:04 pm Post subject: |
|
|
c00l.wave,
Please review a online document Battle testing ZFS, Btrfs and mdadm+dm-integrity. It is old but very informative about the topic you are looking for.
I found the concept "test the setup" author mention is very important and you are looking for way of long term archive you may want to do as the article suggested and even automate the testing process and perform test periodically. |
|
Back to top |
|
|
C5ace Guru
Joined: 23 Dec 2013 Posts: 484 Location: Brisbane, Australia
|
Posted: Sun Mar 19, 2023 6:03 am Post subject: |
|
|
I store irreplaceable photos, etc. on CDs and DVDs. 2 copies. Then weld them into vacuum bags. The oldest is from 1990 and has no errors. _________________ Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as perl, python, C++, C#, Rust and the likes. |
|
Back to top |
|
|
steve_v Guru
Joined: 20 Jun 2004 Posts: 409 Location: New Zealand
|
Posted: Sun Mar 19, 2023 6:59 am Post subject: Re: best options for reliable long-term storage in 2023? |
|
|
c00l.wave wrote: | In case that a ZFS file system fails in a way that prevents normal traversal, what options are there for recovery (apart from just abandoning the corrupted filesystem and instead restoring an external backup)? |
ZFS is designed (COW, atomic transactions) to always maintain filesystem consistency (though individual transactions may be lost under certain circumstances), it really only has 2 states:
* A pool is online, any file-level damage is reported and corrected where possible.
* A pool is trash, restore from backup.
There is no "fsck" as such, but debugging tools are available in case of catastrophic pool loss... The use of which is deep in the weeds and will likely entail nice words with the OpenZFS devs.
ZFS is primarily an enterprise filesystem, and there is an implicit assumption that any really important pool will be replicated, ideally off-site. Features are included (send/recv) to facilitate this.
That is of course good practice anyway, regardless of filesystem. "RAID (and ZFS) is not a backup", "what if the building burns down" and all that related jazz.
As such, while recovery of an un-importable pool is possible, making this easy is not a priority. In such a (rare) scenario, the standard response is "destroy and restore from backup".
c00l.wave wrote: | Does ZFS correctly store copies over multiple disks or may I end up with both copies being present on the same disk, just different partitions? |
As far as I am aware, as with most other RAID-like solutions a bottom-level "disk" is a "disk", and there is no distinction as to where it physically resides. You can use whole-disks, partitions, or even files as "disks" in a vdev, but using multiple partitions on the same physical device in a redundant vdev configuration risks compromising redundancy if a device fails.
You could of course arrange your layout in such a way that multiple partitions on the same disk are never part of the same redundant vdev, though TBH I really don't see the advantage in doing that considering the added complexity over just using whole-disks.
c00l.wave wrote: | Is ZFS as reliable as it seems, even on Linux? |
IME, yes. My "archive" pool has been online since 2013-12-07, through multiple OS upgrades and disk replacements, and has saved me from device failures and silent bitrot on multiple occasions. Zero issues to report, many subtle hardware-level errors corrected.
You might want to avoid native encryption right now though, it's still pretty new and there are scattered reports of bees.
Otherwise, just follow ZFS best practices, as documented in many places on the 'net.
C5ace wrote: | I store irreplaceable photos, etc. on CDs and DVDs. 2 copies. Then weld them into vacuum bags. |
IME the longevity of (re)writable optical media has a whole lot to do with how it is stored. I've seen disks become unreadable in as little as 5 years, and some that are fine at 20. Heat, humidity, and especially sunlight seem to do a real number on them.
C5ace wrote: | The oldest is from 1990. |
Considering the CD-R standard itself dates from '88, the first burner apparently released early '90, and recalling the still eyewatering prices of such hardware even by the late '90s, that makes you a very early adopter. Not doubting mind, but still, those prices.
Or are you talking about getting your files on stamped CDs? How'd you go about that?
*Ed. OCD requires me to fix random double quote I somehow failed to notice. _________________ Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Last edited by steve_v on Sun Mar 19, 2023 3:32 pm; edited 1 time in total |
|
Back to top |
|
|
C5ace Guru
Joined: 23 Dec 2013 Posts: 484 Location: Brisbane, Australia
|
Posted: Sun Mar 19, 2023 8:54 am Post subject: |
|
|
The first CD Writer I had was a external NEC with 80 pin SCSI Card and CD-Rom Caddy's. The price was at the time around $ 1,500. Write speed was 2 times. The drives where actually designed for use as storage arrays when placed in up to 15 drive racks and connected to a full scale SCSI controller.
Use top quality Write Once CD's. One Master and copy. When done keep them for 24hr in a dry and cool environment. Then Vacuum seal them with a dry silica pack in sturdy plastic bags and store them in a dark cool place. _________________ Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as perl, python, C++, C#, Rust and the likes. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sun Mar 19, 2023 11:53 am Post subject: |
|
|
c00l.wave,
Nothing beats clay tablets yet :)
The real problem is that media goes obsolete.
Reel to reel magnetic tape
8" floppy
Video Disc
5 1/4" floppy (various densities)
3 1/2" floppy (various densities)
CD-RW
DVD-RW
BD
.... the list will go on.
You need to migrate to new media
a) while the old media is still readable
b) while you still have hardware that can read the old media.
The Interface problem ..
ST-506
NEC 7220 (floppy interfaces of all sorts of data rates)
IDE (PATA)
SCSI (Parallel varieties) and oddballs.
SATA will be on the obsolete list soon
You need several copies. At least one offsite, to protect against disasters too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
c00l.wave Apprentice
Joined: 24 Aug 2003 Posts: 268
|
Posted: Sun Mar 19, 2023 2:37 pm Post subject: |
|
|
Thanks for your replies so far!
I'm aware that no single "online" (as in: installed into a PC) storage solution can replace backups and indeed I have an (currently only irregularly updated) off-site backup in form of a large external USB HDD. But that backup again has the same issues: It could be affected by undetected bitrot itself and to know that I would have to restore files from there I would first need to notice that I actually got corrupted files on my primary storage. That has to happen before I update the backup and overwrite previously good files with corrupted data. WORM media like DVDs seem antiquated these days but could still be a viable backup solution for a selection of important data but they also need to be checked and renewed at regular intervals which would be much more tedious for small media (it would already not be feasible to back up all my photos on just DVDs). It would be a really nice bonus if the storage solution could automatically notify me about files that actually require restoration from a backup - in my current setup that could easily be caused by some HDD not reliably reporting read errors to mdadm; I've seen that on Seagate drives before (on some servers, not my personal storage).
Backups in general are a whole topic on its own; at this point I'm primarily concerned about choosing an "online" storage solution that should not rely on backup restoration unless some very unlikely and really catastrophic failure occurs (like all 2+ online copies becoming unrecoverable at the same time, which should be preventable through regular scrubbing and monitoring). In case such a failure should actually occur I would like to have the option to attempt partial recovery for files that have not been part of the off-site backup or may actually be in a worse state on the backup media. Ideally a failed side of the mirrored partitions should still be possible to be opened in read-only mode like it is with mdraid (as compared to early SSDs that simply denied all access once they determined some small section of data on them had become unreadable). What I've seen with ReiserFS 3 a long time ago doesn't instill much confidence in "loosely linked" data structures as compared to ext4 but hopefully other filesystems have improved resilience since then?
In general, restoration from a backup should only be the last resort and not something a storage solution actually relies upon for standard operation. Instead, it should be resilient enough to successfully recover from smaller storage defects that are common to all media, at least as long as a readable mirror still exists. _________________ nohup nice -n -20 cp /dev/urandom /dev/null & |
|
Back to top |
|
|
steve_v Guru
Joined: 20 Jun 2004 Posts: 409 Location: New Zealand
|
Posted: Sun Mar 19, 2023 4:17 pm Post subject: |
|
|
c00l.wave wrote: | It would be a really nice bonus if the storage solution could automatically notify me about files that actually require restoration from a backup |
End-to-end checksums (verified on scrub, or any read attempt) and automatic recovery from media errors aren't a "bonus", they're a key feature of ZFS, and one of the big advantages over traditional RAID.
For non-redundant pools (i.e. a single disk), ZED can notify you with a list of damaged files. With sufficient redundancy, errors are automatically repaired and the affected files rewritten.
AFAIK BTRFS has a similar checksum feature if you really don't want to deal with out-of-tree modules, though it is a lot younger (read: less battle-tested) than ZFS and I can't comment on it's reliability personally.
Filesystem-level checksums won't save you from fat-fingers or fires, but they pretty much solve the bitrot (and the HDD is a lying bastard) problem.
My solution for "warm" off-site backup (e.g. an external drive and/or or remote machine) is the same as for live data pools - ZFS replication (over SSH in the latter case) and scheduled scrubs to verify integrity. Not as "cold" as removable WORM media to be sure, but then that media would need to be periodically verified anyway... I'm far too lazy. _________________ Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1248 Location: Richmond Hill, Canada
|
Posted: Sun Mar 19, 2023 5:14 pm Post subject: |
|
|
c00l.wave,
What is your definition of "storage solution"? Is it a ready made product (or procedure/process) that you can plug and play? with the product have all the desired features you mention like "automatically notify me about files that actually require restoration from a backup"
Instead a "solution" may be a "strategy" better suite your need?
A "strategy" that will describe tools involve and steps that need to executed in order to achieve your desire goal. In this case the "backup" should be used as one of the step that will be used in the case when a specific event happen.
Having said above than it should be clear may be what tools used is no so important, but what desired outcome for each event be defined are more important. Just the like a security thread model should be defined before implement a security.
In my mind "backup" is a copy of something at point in time, that will never got modified therefore it can be trusted to be use in the automated process to trigger when a specific event or manual execution in part of procedure. So "backup" should be thinking as part of the "strategy" not be thinking as different topic.
As part of the "strategy" you will define how your "backup" should be available and how to access "backup" when needed by automation or by manually.
As part if the "strategy" you will define how your "store" be monitored, and how the monitoring tool could inform the event process tool trigger notification or automated procedure(s) to perform actions defined
Finally you will define "store" based you cost/performance and recover objective in order to choose the tool to manage your "store"
For example using a wild imagination assume cost is no concern, I would want my data stored in RAM with a perfect system perform snapshot every second and with checksum on each block of RAM and have monitor system automatically restore corrupted block from snapshot without me interfering. And the snapshot will periodically write to a remote location with a robotic store to rotate store media.
Once having a "strategy" defined the tools choice become clear, And there is no concern on what is right store media because as part of "strategy" should have define the ageing problem and how do managing outdated technology.
The point is to thinking about when something gone wrong what need to be done in order to recover to the point you want. It is not about what is the best tool/media for long term store. |
|
Back to top |
|
|
c00l.wave Apprentice
Joined: 24 Aug 2003 Posts: 268
|
Posted: Mon Mar 20, 2023 6:15 pm Post subject: |
|
|
I was indeed a bit unclear about what I'm looking for... normally I would say I'm just searching for the right "file system". However, what I currently use is a technology stack of LVM with PVs being a mix of block-level software RAID1 on GPT partitions and plain GPT partitions without RAID coverage and "inside LVM" I'm using ext4 as an actual file system on the LVs. If I would be using ZFS then all of that stack would just be ZFS - I have no good idea how to summarize ZFS alone as it isn't just a file system but also includes RAID and logical volume management as a single all-in-one "product". When trying to find a term that covers both those "technologies" I couldn't think of anything better than just a "storage solution".
The hardware is already set and should not change from what I currently have. On the "NAS" PC (just a regular tower offering some file access via Samba) that's one smaller old HDD containing the system (which I have a backup of) and less important data, plus two recent 8TB HDDs which already contain a partition-wise (mdraid/)LVM/ext4 stack. The partitions are 1TB each (except for the remainder); only 2x4 of those partitions are currently set up for RAID 1, the other partitions are left for non-redundant storage while I don't need more redundant space. All resulting block-devices are attached to LVM as PVs. That setup could be migrated to something else by either migrating one partition at a time without loss of redundancy or (only if necessary) by intentionally degrading the RAID if I would have to migrate a whole disk at once.
Regarding ZFS it seems that the general recommendation still is to just hand it over entire disks, although it seems like it should also be possible to create mirror groups of pairs of partitions like I did so far. Removing those groups and thus shrinking a zpool also seems to be supported now by more recent OpenZFS releases (use case: I had allocated too many partitions to ZFS and later notice I need 2x 2TB of non-redundant storage more urgently than 2TB of ZFS' equivalent of "RAID1") although it has ugly side-effects like a permanent, non-removable "hole" being reported in the zpool and making that pool incompatible to older versions/tools. What I'm a bit confused about is that it is still said that a zpool should only have a single layout and should never be shrunk nor grown unless absolutely necessary - that's a huge step back from the flexibility I'm used to from LVM in my current setup where, for example, I can simply evacuate 2 PVs of non-redundant partitions before removing them, creating a mdraid on them and add that md device back as a PV to the same LVM (and the opposite works just as easy). I can gradually change the layout without down-time; the only thing I'm unfortunately missing in that setup is "bitrot protection" on the redundant PVs.
I'm still unsure if I would want to commit both large HDDs completely as a RAID to ZFS and even with the possibility to add (and maybe remove) parts of a zpool similar to how I operate my current stack I'm undecided if that's really something I would be happy with in the long term. _________________ nohup nice -n -20 cp /dev/urandom /dev/null & |
|
Back to top |
|
|
steve_v Guru
Joined: 20 Jun 2004 Posts: 409 Location: New Zealand
|
Posted: Mon Mar 20, 2023 7:14 pm Post subject: |
|
|
There's always snapraid, I haven't used it myself, but I have heard good things. Apparently it does checksums and parity-based redundancy at the file level (i.e. bitrot protection) on top of pretty much any filesystem or drive layout you like, and it sounds like it offers the kind of flexibility you are after. _________________ Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1248 Location: Richmond Hill, Canada
|
Posted: Mon Mar 20, 2023 7:32 pm Post subject: |
|
|
So let take the terminology I use in my previous post "store" as example, translate to this discussion about file system storage.
Take what you currently using MDRAID+LVM or what ZFS offer. Both offer a pool concept. the pool concept in LVM is call Volume Group(vg) whereas in ZFS it is known as Zpool.
The pool concept provide a abstract of storage that not specific to the actual underlying technology used in the pool, but give the user of the pool a abstract that there are undefine limit of store that can be allocate by the pool's policy.
So for example your current store implementation, could be partitioned by Logical volume(lv). In ZFS, can be created as Virtual Device(vdev).
As you desire to limit the size of file system storage so in the event the file system need fix it can be faster. You should use LV to carve out (as in partition) the desired size and create file system on top of the lv. Whereas in ZFS, the create filesystem and carve out desired size is done in one step. so fro management and maintenance point of view ZFS is simpler
The advice you receive about use whole disk for ZFS is equally for your current MDRAID+LVM
MDRAID+LVM vs ZFS in term of store at conception level is not much of different. From my point of view, it is much better to use something you are familiar than try something unsure of.
My days of using ZFS is more than 10 years ago and it on Solaris so I cannot say for sure on Linux. but I think it is possible to play around with ZFS by lvm's lv, I mean may be you can try to create a zpool on top of bunch lv.
My key point is that current storage tech aside, the underlying conception is same. ZFS have integrity checkl, so is LVM (dm-integrity). ZFS can snapshot. So is LVM (dm-snapshot). I think both rely on external monitoring for the physical disk (SMART). My bet is on LVM(+RAID) since it is in kernel tree and exist much longer compare to linux-zfs. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|