best options for reliable long-term storage in 2023?

c00l.wave · Apprentice Joined: 24 Aug 2003 Posts: 268

I'm running out of disk space on my main machine and plan to move "long-term static data" to my "NAS"/home server (also running on Gentoo). So far I've been happily using partition-wise mdraid 1 + LVM + ext4 on both machines for many, many years. However, especially since ZFS - at least at first glance - appears to be stable on Linux nowadays, I wanted to check if maybe there's a better way of safely storing my data for long-term storage. The data I need to relocate is of high personal value, such as photos and videos or other typical "archive" data, so some way of automatic recognition and repair of (partially?) corrupted files would be great - even with a backup I would have to notice that I actually had data corruption before I could try to restore a file.

I guess I need to explain the partition-wise setup... Due to the (very) large size of modern drives I do not want to commit a full 8+ TB to a single RAID or filesystem, so I split it up into something like 2TB partitions that are then set up for mdraid and/or LVM for a few reasons:

I do not need all that space immediately but may have some use-cases for non-RAID storage at a later point. That's easy to achieve by this method as I can simply remove an unused partition and reformat it.
HDD read errors I've experienced so far were usually limited to just a small physical area of the disks - of course I would replace the entire HDD if it starts failing that way but IMO adding some sort of "independent" smaller partitions could make it easier to keep the good parts synced (the drive only degrades partially which could increase the chances to copy to a new drive if further corruption gets detected on the other drive - usually that's the ultimate error case for RAID 1).
In case of some kernel panics, sudden power loss or other non-fatal events I've ended up with degraded RAIDs a bit too often. Limiting the size of RAID partitions means I may have less data to resync; other RAID partitions may remain synced (I would still scrub them after such events) or at least can be recovered quicker.

Regarding both options (current stack or ZFS) I have a few concerns:

mdraid/LVM/ext4: Over the years I've occasionally come across 0 byte files (which is obvious) or suspiciously small JPEG files that, when opened, only partially load - meaning that at some point I actually had some sort of silent, unnoticed partial file corruption. That's mainly on older files which date back to a time when the data was not stored on a RAID or on other file systems (NTFS or ReiserFS) but I'm also not 100% confident that it really did not happen while already on my current setup. In some cases I'm actually pretty sure those files were always stored on the current setup - but I may have had to copy the files between filesystems when I installed new hard drives or repartitioned them. Even when doing semi-regular mdraid scrubbing (my PCs do not run 24/7 so I cannot schedule it) those corruptions may not be detected (mdraid requires the drive to issue a read error in order to repair a block). If I wanted some file-based check I would have to use some additional tool.
ZFS: The original ZFS always sounded great, among other things it would actually be capable of doing the file-based check & repair I would like to have. Unfortunately, due to license incompatibility, it cannot be integrated directly to the mainline kernel. Since it comes from a different OS (Solaris, originally) it also requires a port to be used on Linux, so even if it started as a verbatim port, ZFS on Linux has probably diverged with optimizations over time (also, weren't there at least two different forks?). I know that, in general, ZFS is typically used on BSD-based NAS systems - but is the Linux version really equally stable and usable? Having had one incident of complete data loss on ReiserFS which also used loose B* structures I'm also a bit reluctant to again switch to a file system that does not use super blocks like ext4 (which I was already able to restore files with once). In case that a ZFS file system fails in a way that prevents normal traversal, what options are there for recovery (apart from just abandoning the corrupted filesystem and instead restoring an external backup)? If I were to switch to ZFS I would still like to keep it on smaller partitions (without mdraid, of course), like the partition-wise mdraid + LVM in my current setup, for the reasons I explained above. However, I'm unsure if that's practical with ZFS - usually people seem to just add the entire disks, not just (adjacent) partitions? Does ZFS correctly store copies over multiple disks or may I end up with both copies being present on the same disk, just different partitions?

Is there any other option I missed? btrfs still doesn't sound like it would fit be a good fit for such archive use-cases; ZFS seems like the better option to me in terms of reliability?

What would you recommend? Is ZFS as reliable as it seems, even on Linux?
_________________
nohup nice -n -20 cp /dev/urandom /dev/null &

pingtoo · Posted: Sat Mar 18, 2023 7:04 pm Post subject:

c00l.wave,

Please review a online document Battle testing ZFS, Btrfs and mdadm+dm-integrity. It is old but very informative about the topic you are looking for.

I found the concept "test the setup" author mention is very important and you are looking for way of long term archive you may want to do as the article suggested and even automate the testing process and perform test periodically.

C5ace · Posted: Sun Mar 19, 2023 6:03 am Post subject:

I store irreplaceable photos, etc. on CDs and DVDs. 2 copies. Then weld them into vacuum bags. The oldest is from 1990 and has no errors.
_________________
Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as perl, python, C++, C#, Rust and the likes.

steve_v · Guru Joined: 20 Jun 2004 Posts: 409 Location: New Zealand

C5ace · Posted: Sun Mar 19, 2023 8:54 am Post subject:

The first CD Writer I had was a external NEC with 80 pin SCSI Card and CD-Rom Caddy's. The price was at the time around $ 1,500. Write speed was 2 times. The drives where actually designed for use as storage arrays when placed in up to 15 drive racks and connected to a full scale SCSI controller.

Use top quality Write Once CD's. One Master and copy. When done keep them for 24hr in a dry and cool environment. Then Vacuum seal them with a dry silica pack in sturdy plastic bags and store them in a dark cool place.
_________________
Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as perl, python, C++, C#, Rust and the likes.

NeddySeagoon · Posted: Sun Mar 19, 2023 11:53 am Post subject:

c00l.wave,

Nothing beats clay tablets yet :)

The real problem is that media goes obsolete.

Reel to reel magnetic tape
8" floppy
Video Disc
5 1/4" floppy (various densities)
3 1/2" floppy (various densities)
CD-RW
DVD-RW
BD

.... the list will go on.

You need to migrate to new media
a) while the old media is still readable
b) while you still have hardware that can read the old media.

The Interface problem ..
ST-506
NEC 7220 (floppy interfaces of all sorts of data rates)
IDE (PATA)
SCSI (Parallel varieties) and oddballs.
SATA will be on the obsolete list soon

You need several copies. At least one offsite, to protect against disasters too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

c00l.wave · Apprentice Joined: 24 Aug 2003 Posts: 268

Thanks for your replies so far!

I'm aware that no single "online" (as in: installed into a PC) storage solution can replace backups and indeed I have an (currently only irregularly updated) off-site backup in form of a large external USB HDD. But that backup again has the same issues: It could be affected by undetected bitrot itself and to know that I would have to restore files from there I would first need to notice that I actually got corrupted files on my primary storage. That has to happen before I update the backup and overwrite previously good files with corrupted data. WORM media like DVDs seem antiquated these days but could still be a viable backup solution for a selection of important data but they also need to be checked and renewed at regular intervals which would be much more tedious for small media (it would already not be feasible to back up all my photos on just DVDs). It would be a really nice bonus if the storage solution could automatically notify me about files that actually require restoration from a backup - in my current setup that could easily be caused by some HDD not reliably reporting read errors to mdadm; I've seen that on Seagate drives before (on some servers, not my personal storage).

Backups in general are a whole topic on its own; at this point I'm primarily concerned about choosing an "online" storage solution that should not rely on backup restoration unless some very unlikely and really catastrophic failure occurs (like all 2+ online copies becoming unrecoverable at the same time, which should be preventable through regular scrubbing and monitoring). In case such a failure should actually occur I would like to have the option to attempt partial recovery for files that have not been part of the off-site backup or may actually be in a worse state on the backup media. Ideally a failed side of the mirrored partitions should still be possible to be opened in read-only mode like it is with mdraid (as compared to early SSDs that simply denied all access once they determined some small section of data on them had become unreadable). What I've seen with ReiserFS 3 a long time ago doesn't instill much confidence in "loosely linked" data structures as compared to ext4 but hopefully other filesystems have improved resilience since then?

In general, restoration from a backup should only be the last resort and not something a storage solution actually relies upon for standard operation. Instead, it should be resilient enough to successfully recover from smaller storage defects that are common to all media, at least as long as a readable mirror still exists.
_________________
nohup nice -n -20 cp /dev/urandom /dev/null &

steve_v · Guru Joined: 20 Jun 2004 Posts: 409 Location: New Zealand

pingtoo · Posted: Sun Mar 19, 2023 5:14 pm Post subject:

c00l.wave,

What is your definition of "storage solution"? Is it a ready made product (or procedure/process) that you can plug and play? with the product have all the desired features you mention like "automatically notify me about files that actually require restoration from a backup"

Instead a "solution" may be a "strategy" better suite your need?

A "strategy" that will describe tools involve and steps that need to executed in order to achieve your desire goal. In this case the "backup" should be used as one of the step that will be used in the case when a specific event happen.

Having said above than it should be clear may be what tools used is no so important, but what desired outcome for each event be defined are more important. Just the like a security thread model should be defined before implement a security.

In my mind "backup" is a copy of something at point in time, that will never got modified therefore it can be trusted to be use in the automated process to trigger when a specific event or manual execution in part of procedure. So "backup" should be thinking as part of the "strategy" not be thinking as different topic.

As part of the "strategy" you will define how your "backup" should be available and how to access "backup" when needed by automation or by manually.

As part if the "strategy" you will define how your "store" be monitored, and how the monitoring tool could inform the event process tool trigger notification or automated procedure(s) to perform actions defined

Finally you will define "store" based you cost/performance and recover objective in order to choose the tool to manage your "store"

For example using a wild imagination assume cost is no concern, I would want my data stored in RAM with a perfect system perform snapshot every second and with checksum on each block of RAM and have monitor system automatically restore corrupted block from snapshot without me interfering. And the snapshot will periodically write to a remote location with a robotic store to rotate store media.

Once having a "strategy" defined the tools choice become clear, And there is no concern on what is right store media because as part of "strategy" should have define the ageing problem and how do managing outdated technology.

The point is to thinking about when something gone wrong what need to be done in order to recover to the point you want. It is not about what is the best tool/media for long term store.

c00l.wave · Apprentice Joined: 24 Aug 2003 Posts: 268

I was indeed a bit unclear about what I'm looking for... normally I would say I'm just searching for the right "file system". However, what I currently use is a technology stack of LVM with PVs being a mix of block-level software RAID1 on GPT partitions and plain GPT partitions without RAID coverage and "inside LVM" I'm using ext4 as an actual file system on the LVs. If I would be using ZFS then all of that stack would just be ZFS - I have no good idea how to summarize ZFS alone as it isn't just a file system but also includes RAID and logical volume management as a single all-in-one "product". When trying to find a term that covers both those "technologies" I couldn't think of anything better than just a "storage solution".

The hardware is already set and should not change from what I currently have. On the "NAS" PC (just a regular tower offering some file access via Samba) that's one smaller old HDD containing the system (which I have a backup of) and less important data, plus two recent 8TB HDDs which already contain a partition-wise (mdraid/)LVM/ext4 stack. The partitions are 1TB each (except for the remainder); only 2x4 of those partitions are currently set up for RAID 1, the other partitions are left for non-redundant storage while I don't need more redundant space. All resulting block-devices are attached to LVM as PVs. That setup could be migrated to something else by either migrating one partition at a time without loss of redundancy or (only if necessary) by intentionally degrading the RAID if I would have to migrate a whole disk at once.

Regarding ZFS it seems that the general recommendation still is to just hand it over entire disks, although it seems like it should also be possible to create mirror groups of pairs of partitions like I did so far. Removing those groups and thus shrinking a zpool also seems to be supported now by more recent OpenZFS releases (use case: I had allocated too many partitions to ZFS and later notice I need 2x 2TB of non-redundant storage more urgently than 2TB of ZFS' equivalent of "RAID1") although it has ugly side-effects like a permanent, non-removable "hole" being reported in the zpool and making that pool incompatible to older versions/tools. What I'm a bit confused about is that it is still said that a zpool should only have a single layout and should never be shrunk nor grown unless absolutely necessary - that's a huge step back from the flexibility I'm used to from LVM in my current setup where, for example, I can simply evacuate 2 PVs of non-redundant partitions before removing them, creating a mdraid on them and add that md device back as a PV to the same LVM (and the opposite works just as easy). I can gradually change the layout without down-time; the only thing I'm unfortunately missing in that setup is "bitrot protection" on the redundant PVs.

I'm still unsure if I would want to commit both large HDDs completely as a RAID to ZFS and even with the possibility to add (and maybe remove) parts of a zpool similar to how I operate my current stack I'm undecided if that's really something I would be happy with in the long term.
_________________
nohup nice -n -20 cp /dev/urandom /dev/null &

steve_v · Guru Joined: 20 Jun 2004 Posts: 409 Location: New Zealand

There's always snapraid, I haven't used it myself, but I have heard good things. Apparently it does checksums and parity-based redundancy at the file level (i.e. bitrot protection) on top of pretty much any filesystem or drive layout you like, and it sounds like it offers the kind of flexibility you are after.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.

pingtoo · Posted: Mon Mar 20, 2023 7:32 pm Post subject:

So let take the terminology I use in my previous post "store" as example, translate to this discussion about file system storage.

Take what you currently using MDRAID+LVM or what ZFS offer. Both offer a pool concept. the pool concept in LVM is call Volume Group(vg) whereas in ZFS it is known as Zpool.

The pool concept provide a abstract of storage that not specific to the actual underlying technology used in the pool, but give the user of the pool a abstract that there are undefine limit of store that can be allocate by the pool's policy.

So for example your current store implementation, could be partitioned by Logical volume(lv). In ZFS, can be created as Virtual Device(vdev).

As you desire to limit the size of file system storage so in the event the file system need fix it can be faster. You should use LV to carve out (as in partition) the desired size and create file system on top of the lv. Whereas in ZFS, the create filesystem and carve out desired size is done in one step. so fro management and maintenance point of view ZFS is simpler

The advice you receive about use whole disk for ZFS is equally for your current MDRAID+LVM

MDRAID+LVM vs ZFS in term of store at conception level is not much of different. From my point of view, it is much better to use something you are familiar than try something unsure of.

My days of using ZFS is more than 10 years ago and it on Solaris so I cannot say for sure on Linux. but I think it is possible to play around with ZFS by lvm's lv, I mean may be you can try to create a zpool on top of bunch lv.

My key point is that current storage tech aside, the underlying conception is same. ZFS have integrity checkl, so is LVM (dm-integrity). ZFS can snapshot. So is LVM (dm-snapshot). I think both rely on external monitoring for the physical disk (SMART). My bet is on LVM(+RAID) since it is in kernel tree and exist much longer compare to linux-zfs.