random read errors with mdraid??? [SOLVED]

dobbs · Last edited by dobbs on Fri Apr 06, 2012 6:43 am; edited 1 time in total

Alright, I need a second opinion from a kernel guru.

I dd'ed my windows partition (115GiB) into a file, and then this happened:

NeddySeagoon · Posted: Sat Mar 31, 2012 10:35 am Post subject:

dobbs,

You can't usefully dd anything from a mounted filesystem because you will have open files. If thats what you did throw away the image and start afgain.
With read errors on a single drive in a raid5 array, you won't notice. Any n-1 from n drives works.

If you suspect the raid array do

dobbs · Posted: Sat Mar 31, 2012 5:56 pm Post subject:

NeddySeagoon · Posted: Sat Mar 31, 2012 8:37 pm Post subject:

dobbs,

My drive is an WD20EARS. Thats a green 2Tb drive. I have five in raid5 and two have died over the last few weeks.
The first one was obvious - mega iowaits. When I replaced that, the resyc failed as another drive (the one I showed above, has 6 bad blocks.

Bit flipping sonds like dud RAM. Data reads from the HDD into its RAM is CRC protected. Across the raid set, its 'parity protected'
If your resysnc did not produce any errors - your data is self consistant in the raid set. Thats does not meanits correct, just that all the members of the raid agree on what it is. Those two things taken together rule out any bit flipping.

If your drives are SATA, the data interface is serial, that only one bit gets flipped during data transmission over a serial link is well beyond my incredability threshold. That only leaves the motherboard and its component parts.

Time to boot into memtest86+ and run a few cycles.
Errors found in memtest86 do not always point to RAM. Its only likely to be RAM if you get the same error at the same address every time.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

dobbs · Posted: Sat Mar 31, 2012 9:55 pm Post subject:

Sorry to hear about losing the drives, Neddy. I've been weary of drive reliability since we passed the 500GB mark. I think that's when "perpendicular recording" became common. Possibly just me being paranoid, though. I do need to replace these three drives for various reasons: they're only 320GB, two of them are PATA, more than 45,000 hours operating... Like I said, this array is old. :) Unfortunately, I don't know what to purchase anymore.

dobbs · Posted: Fri Apr 06, 2012 6:43 am Post subject:

Yep. Replacing the RAM resolved the issue. Marking solved.

NeddySeagoon · Posted: Fri Apr 06, 2012 5:28 pm Post subject:

dobbs,

I bet putting your old RAM back in would work too. Thats called 'wiping the contacts'. It reduces the contract resistance between the plugged in parts and is usually good for 12 to 18 months.

Oh, I lost 3 DVDs tops as I have 2 one block errors and a four block error, all in the area where my DVD rips are stored.
The raid5 is back and WD replaced 2 nine month old drives under warranty.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

kimmie · Guru Joined: 08 Sep 2004 Posts: 531 Location: Australia

Neddy,

That load cycle count in your smartctl output looks a little high. Do you know about the nasty head-unloading behaviour of WD20EARS under linux, and how to cure it with WDIDLE.exe? I have some of these drives in RAID5 too... they needed to be spanked before they kept their heads in the right place.

Anyway if you can't find this utility and you need it drop me a PM.

NeddySeagoon · Posted: Sat Apr 07, 2012 4:24 pm Post subject:

kimmie,

I'm aware of the head-unloading every eight seconds issuse now. I wasn't when I set up the raid.

I understand that WDIDLE.exe needs to be run under Windows and Windows, or even getting those drives near a box with a GUI, is out of the question.

I'm using

kimmie · Guru Joined: 08 Sep 2004 Posts: 531 Location: Australia

Just needs DOS... I had to make a FreeDOS boot floppy and boot that. I'm guessing you could convince FreeDOS to redirect console to serial if you cared enough.

NeddySeagoon · Posted: Sat Apr 07, 2012 8:50 pm Post subject:

kimmie,

The drives are in a HP Microserver. There is no floppy interface and no PATA interfacae.
Its USB or (e)SATA

Hmm - I wonder if I could remaster a SystemRescueCD image to put on a USB pen drive, so WDIDLE.exe (and FreeDOS) was one of its image tools.
I can at least test that the floppy boots on another box before I make the ISO
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

dobbs · Posted: Thu Apr 12, 2012 10:08 pm Post subject: