View previous topic :: View next topic |
Author |
Message |
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Wed Oct 19, 2022 2:51 pm Post subject: LUKS on RAID weird behavior... any experiences? |
|
|
Due to layering I would have expected my ext4fs over LUKS cryptsetup over MDRAID5 to "just work"... but something strange is happening.
One of my disks is coughing up tons of SATA errors including bad sectors. However MDRAID has not yet kicked it from the array...but somehow I got massive corruption on the filesystem?? Theoretically if a sector can't be read it errors out and tries to recalculate it from the remaining disks...
but somehow, sometimes I get a blank (zeroed) sector. Which then LUKS decrypts... and returns garbage to ext4fs instead of a blank sector?
Is this possible? This seems farfetched due to the theoretical "don't return garbage without saying so" and ideally each of the layers should know when garbage is being handled and taken with a grain of salt... but I'm not sure how this corruption is happening...
Perhaps it's not a good idea to run LUKS over MDRAID yet? I'm sure a lot of people are doing this but anyone had disks fail in this setup yet? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
araxon Tux's lil' helper
Joined: 25 May 2011 Posts: 85
|
Posted: Fri Oct 21, 2022 11:03 am Post subject: |
|
|
That sounds like a faulty drive or cable. I use ext4 on top of luks on top of mdadm raid1 in multiple machines for 10 years now and never had a single problem.
If the drive returns zeros and tells the system it is the correct requested data, then it makes sense that luks "decrypts" the zeros to some garbage and then lets the ext4 to make sense of it. I see no problem with what the higher layers do, except the drive.
What drive it is? What do the errors look like in the log? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Fri Oct 21, 2022 2:26 pm Post subject: |
|
|
"Never had a single problem" meaning you never had a hard drive fail either and thus never had to go through the motions of swapping a disk?
BTW this is 4-disk MDRAID5 (in which xors of faulty data would also generate some weirdness), but I ended up with a lot of corruption from the filesystem, when emerge crashed and flagged the filesystem for fsck, I got hundreds of thousands of errors that I needed to fix during the subsequent fsck.
Since this is a backup server, no big loss but I may need to re-image this machine... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Fri Oct 21, 2022 5:26 pm Post subject: |
|
|
eccerr0r,
I have a drive that reads correctly but writes rubbish.
I found that out by accident one day. It was flagged for a fsck but I imaged it first, as fsck is will known for making a bad situation worse.
Then I compared the image with the original - so I had two identical read from the drive.
Now fsck was allowed to do its stuff. It was happy but the partition still wouldn't mount.
Fsck changed more on the next run
Giving the drive up for dead (I still had my image), I wrote a few blocks of random data repeated with the same data at several locations on the drive.
Reads were consistent but I didn't get the data that I wrote.
Have you got one of those? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Fri Oct 21, 2022 6:20 pm Post subject: |
|
|
Interesting, I might have to consider that possibility. Usually it's bad cable but it tends to be CRC/parity checked and it will reject writes that fail CRC/parity. But if the on-drive buffer RAM is no good...that may very well be the issue... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3896 Location: Rasi, Finland
|
Posted: Sat Oct 22, 2022 5:59 am Post subject: |
|
|
SMR drive?
I've heard that one should not use SMR drives on some RAID arrays.
I don't know which combination of SMR hard drive model, hardware or software raid and raid level is bad (since I mostly use SSDs and lower capacity HDDs) but I've heard it can cause problems. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Sat Oct 22, 2022 6:54 pm Post subject: |
|
|
Unless there were 500G 3.5" SMR disks, not that I know of...
Anyway, there were some pending sectors but just did a repair on the array but now...
Code: | 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
|
Hmm... go figure...
Anyway the only suspects that make sense at this point is the drive generates random data. That would be the only thing that would pass garbage data up the stack I would think, unless there's a bug in the code that passes bad "poisoned" data, possibly due to a race condition - then that would be a software bug and no hardware hacks whether swapping to different, like non-SMR disks could fix that.
I've been using non-LUKS encrypted RAIDs for years including many drive replacements and this is the worst corruption I've seen so far. While there's still a possibility that there's hardware issues (RAM checks good, CPU is *assumed* good - which is a problem), the problem drive seems to stick to one drive... but yes could be fighting two different issues at this point, but this is still too suspicious as they seem really closely related as I got corruption soon after a disk bad sector report. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
araxon Tux's lil' helper
Joined: 25 May 2011 Posts: 85
|
Posted: Mon Oct 24, 2022 7:03 pm Post subject: |
|
|
eccerr0r wrote: | "Never had a single problem" meaning you never had a hard drive fail either and thus never had to go through the motions of swapping a disk? |
No, that's not what I meant. I have been using Gentoo since about 2005 nearly exclusively on dozens of servers over the time. Mostly with MD RAIDs 1 or 5. I have had to replace many failed drives and mdadm always saved the day. And yes, even in combination with LUKS, I had a few failed drives. Guess I was just lucky, and my drives just died, instead of spewing random data. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3896 Location: Rasi, Finland
|
Posted: Tue Oct 25, 2022 7:57 pm Post subject: |
|
|
eccerr0r wrote: | Anyway the only suspects that make sense at this point is the drive generates random data. That would be the only thing that would pass garbage data up the stack I would think | Does mdraid5 actually use parity data when reading? I mean if the drive does not report it's faulty, then what?
I can't find the source at the moment, but mdraid "isn't good" at checking the integrity of the data it serves. If a drive has failed it will act for sure, but what if a drive tells "I'm ok"?
EDIT: https://www.youtube.com/watch?v=l55GfAwa8RI&t=340s
And some discussions on stack exchange: https://unix.stackexchange.com/questions/105337/bit-rot-detection-and-correction-with-mdadm
(Bit rot is the worst. I like btrfs and its ability to do almost every action online, but every action on btrfs is slow. That's why I choose mdraid+lvm+xfs_or_ext4 most of the time.)
Anyway. I have suspicion that your drive thinks it's ok. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
Last edited by Zucca on Tue Oct 25, 2022 8:12 pm; edited 1 time in total |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Tue Oct 25, 2022 8:10 pm Post subject: |
|
|
Zucca,
Quote: | If a drive has failed it will act for sure, but what if a drive tells "I'm ok"? |
If a drive fails, it gets kicked out of the array and you no longer have any parity data to check.
Its actually quite rare that its that black and white.
When an unreadable black is encountered, mdadm uses some other N drives to get at the data.
I'm not sure if it tries to fix it at that time or not. The drive with the failed read is not always kicked out of the array. In part, thats determined by how long the error handler takes. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3489
|
Posted: Tue Oct 25, 2022 9:51 pm Post subject: |
|
|
Zucca wrote: | Does mdraid5 actually use parity data when reading? I mean if the drive does not report it's faulty, then what?
I can't find the source at the moment, but mdraid "isn't good" at checking the integrity of the data it serves. If a drive has failed it will act for sure, but what if a drive tells "I'm ok"?
|
I have tested mdraid6 for parity check on read and it didn't do that. If data chunks are available, it assumes they have correct data, even though double parity does allow the disks to outvote the corrupt strip. There might be some option to enable it, but it's definitely disabled by default (afair for performance reasons)
I haven't tested raid5, but I don't think it would be enabled by default either... In case of data corruption it could only report the fact anyway, there is not enough redundancy to recover without out-of-band hint pointing at the failed block. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Tue Oct 25, 2022 10:48 pm Post subject: |
|
|
Indeed, for performance purposes, MDRAID5 only reads primary data blocks. The parity blocks are read only during degraded mode, recovery, or integrity checks.
Which makes it even weirder, if my RAID couldn't read a sector, it should have computed the block based on the other disks...so one of the other disks is coughing up random data...
??!
Or is it just that one disk coughing up random data and a series of blocks have issues...
hmm... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3896 Location: Rasi, Finland
|
Posted: Fri Oct 28, 2022 4:51 pm Post subject: |
|
|
Interestingly (at least back in 2016) linux md will assume data blocks to be correct, even on raid6, and then rewrite (recalculate) parity blocks during scrub.
See this answer and the comments below it.
Md raid cache will help some at least.
I'd like to conduct some tests on vm... But until I have time... _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Fri Oct 28, 2022 5:34 pm Post subject: |
|
|
Now I don't know about SSDs but all HDDs (and floppy drives too!) since antiquity at least use CRC if not have ECC to error check/correct data written.
However if the hard drive electronics/microcontroller is broken and lies about data being correct or not, then we're screwed no matter what.
This should be quite rare one would hope, hard drive manufacturers test this firmware thoroughly, but perhaps I got unlucky with a bad drive.
Incidentally, after jiggling the SATA data/power cables and scrubbing the array, the errors have quieted up and so far the RAID has been behaving correctly and been getting the data back as written... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3896 Location: Rasi, Finland
|
Posted: Sun Oct 30, 2022 8:53 am Post subject: |
|
|
eccerr0r wrote: | Now I don't know about SSDs but all HDDs (and floppy drives too!) since antiquity at least use CRC if not have ECC to error check/correct data written. | I have the impression ECC needs those magical 520 bit sectors.
eccerr0r wrote: | However if the hard drive electronics/microcontroller is broken and lies about data being correct or not, then we're screwed no matter what. | Checksums to save the day? Right?
eccerr0r wrote: | Incidentally, after jiggling the SATA data/power cables and scrubbing the array, the errors have quieted up and so far the RAID has been behaving correctly and been getting the data back as written... | Hm. Signaling issue? But surely kernel should have noticed that? _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Sun Oct 30, 2022 3:53 pm Post subject: |
|
|
Not sure what was "magical" but there are a LOT of bits on disks/platters that aren't and don't need to be accessible by the OS, including ECC/CRC bits and tracking information. Also not sure what's magic about 520 - more bits, why not 600 bits per 512 bit sector - will provide more redundancy to repair bad reads. But ultimately, this is overhead, why not take a 1TB disk and do intra-track RAID1 mirroring (i.e. on a 300 sector track, RAID1 sectors 1-150 onto sectors 151-300) and get a 500GB disk?
In any case any hardware that fails to detect errors to spec (hard drive manufacturers do specify the expected error rate, and it's not zero) ... simply put it, it's faulty. Whether trusting the drive to get it right or trusting the computer to get it right is immaterial - both really need to get it right.
And yes I don't get it, I expect the OS to know what is poisoned by cables since SATA (and even UDMA/PATA) is CRC checked...so MDRAID should have detected there was poison and read off the remaining disks to reconstruct... Still very weird. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Mon Oct 31, 2022 8:09 pm Post subject: |
|
|
eccerr0r,
.... unless it reads correctly but writes rubbish, correctly. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9883 Location: almost Mile High in the USA
|
Posted: Mon Oct 31, 2022 9:52 pm Post subject: |
|
|
yeah that would fall under the bad hard drive category and would be SDC, but since an error was detected at the SATA/UDMA/cable level, the error should be handled all the way up the chain until the consumer knows it's bad...
There is a saying in computer hardware, sometimes it's better to report nothing than to report garbage... this apparently is not being honored. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
|