RAID block consistency check: which block is correct?

halfgaar · l33t Joined: 22 Feb 2004 Posts: 781 Location: Netherlands

Hi,

The Gentoo wiki page on software raid says this:

John R. Graham · Posted: Sun Jan 03, 2010 1:40 pm Post subject:

RAID 1 doesn't compare two good blocks, nor does any RAID level. RAID 1 normally uses the block from the first disk and, in the event of a hardware disk read error, takes data from the second. (Note that load leveling makes this a little more complicated, but it's still a good conceptual model.) Since read errors can sometimes be corrected by merely re-writing, that's what RAID 1 tries next, using data from the good block on the mirrored drive. If the re-write fails, then the disk is marked bad and, "...kicked out of the active array." Clear?

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.

halfgaar · l33t Joined: 22 Feb 2004 Posts: 781 Location: Netherlands

I'm not talking about normal operations, I'm talking about when you do

aidanjt · Posted: Sun Jan 03, 2010 3:49 pm Post subject:

RAID1 provides no guarantees about data integrity. If there's a block mismatch the array is flagged dirty, and MD will do what John mentioned to attempt to clean the array, MD isn't concerned with whether the data is correct (as in what you actually wrote), just as long as the array is in sync.
_________________

halfgaar · l33t Joined: 22 Feb 2004 Posts: 781 Location: Netherlands

Well, I have an array with 128 mismatched blocks which is not marked as broken. So what does that mean?
_________________
Linux backups the right way.
Get surround sound working.

eccerr0r · Posted: Mon Jan 04, 2010 7:10 pm Post subject:

If you have mismatched blocks in RAID1, likely it means

1- you shut down uncleanly (crash/reboot) -- this is most likely the culprit. mdraid never got a chance to update both disks
2- you mounted one disk alone and updated it without updating the other disk (oops, user error)
3- bug in mdraid software
4- hard drive returned wrong data (unlikely due to hd ecc checking)

It's up to you to figure out which disk of the RAID1 blocks is the correct one and copy it to the other disk. If you just want to make it choose one or the other, then just sync them... which means you may have corrupt data that you're copying to the other disk...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

halfgaar · l33t Joined: 22 Feb 2004 Posts: 781 Location: Netherlands

But the thing is, it's not marked as failed, even though I forced a check. Apparently, the driver doesn't consider it a problem. All your hypotheses are protected against by the driver, which would give a notice when you try to activate the array.
_________________
Linux backups the right way.
Get surround sound working.

eccerr0r · Posted: Mon Jan 04, 2010 10:32 pm Post subject: