Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Software RAID-5 problem
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
davecorder
n00b
n00b


Joined: 03 Sep 2004
Posts: 10

PostPosted: Thu Dec 23, 2004 1:21 pm    Post subject: Software RAID-5 problem Reply with quote

The problem: I've got a personal file server set up with a RAID-5 configuration. It's been running very smoothly for the last few months, serving files to Mac and PC clients via Samba. Late yesterday, the machine froze while I was copying files off it via the network, and when I rebooted the machine, the OS hung on "Mounting local filesystems..." and ever since then I can't get the array to work with Gentoo (it does, however, appear to work just fine with Knoppix).

The hardware:
AMD Athlon XP 1500+
384 MB PC133 SDRAM
MSI K7T-Turbo2 MB
GeForce2 MX (NV11 DDR)
RealTek 8139C-based 10/100 Ethernet
Two Promise SATA150 TX4 4-port SATA host controllers
Two Hauppauge PVR-250 TV Tuners
IBM 20 GB ATA/100 (boot drive /dev/hda)
Generic 16X DVD-ROM (/dev/hdc)
8 160GB Maxtor PATA drives with HighPoint RocketHead PATA/SATA adapters (/dev/sda through /dev/sdh)

The software:
Gentoo Linux (installed from 2004.2, up to date as of about 2 weeks ago)
Kernel is 2.6.9-gentoo-r9
Using sata_promise module (from libata) for the SATA controllers
RAID array is RAID-5, built from the 8 Maxtor drives (/dev/md0)
Using XFS for the filesystem on the array (1.1 TB total formatted space)

Like I said, this system was working fine up until yesterday's crash. I'm not sure if the crash caused the RAID failure, or if a RAID problem caused the crash. In either case, I can't find any info on the crash in any log file.

After the crash, I noticed something odd: if I soft-rebooted the machine after it hangs with the reset button on the front, the BIOS on the SATA cards would not detect any drives (I let it fill one row across the screen with it's progress bar before killing power). BUT: if I power down the machine and the boot it, the drives are detected just fine.

I was able to boot the machine with a Knoppix 3.6 CD. I was then able to copy the /etc/raidtab file from my boot partition and start the RAID while in Knoppix. The array was out of sync then, and it took about 5 or 6 hours to restore it to good condition. While it was rebuilding, I mounted it and cleared up some inconstancies in the XFS journal (I tried to run xfs_check, but apparently there were some bad sectors on my CD while reading xfs_db). I was able to copy some random files off the array while it was recovering, so it seems that the RAID-5 has done it's purpose and protected my data. /proc/mdstat said all the drives in the array were good (no failures).

Now comes the fun part. I rebooted back into Gentoo and the system stuck right where it had before: on "Mounting local filesystems."

This comes very shortly after the "Starting RAID devices" step in the boot sequence.

One thing that occurred to me is that perhaps it is actually just waiting for the RAID rebuild process to finish before mounting the filesystem. But that shouldn't be the case, since that process happens entirely in the background and the array is still usable while it's being rebuilt. So that's not what is going on.

After a bit of tweaking (disabling autoloading of the sata_promise module and moving /etc/raidtab so no RAID devices are started on boot), I was able to get Gentoo back up and running without the array. I started the array manually. So far so good. Then I mounted the file system. About 5 seconds later, the system froze again. I don't know if it would have frozen if I had just left the array active and didn't mount the filesystem.

So, rebuilding it under the Knoppix CD didn't help.

At the moment, I'm thinking I've got some sort of drive failure, even though I'm not seeing any error messages in the log files and /proc/mdstat reports that all the drives are good. I'm currently in the process of running Maxtor's PowerMax diagnostic utility on the drives (which, despite what the readme says, does detect the drives connected to my third-party SATA controller), so hopefully that'll reveal something.

The version of libata in my kernel does not yet have SMART support (I plan to patch to libata-dev to get that ASAP), so I can't use that at the moment to determine if a drive is going bad.

On the plus side, if it is a drive failure, I have a cold spare (200 GB, though, but that's not a big deal) read to be inserted.

Any thoughts as to what I should be looking at if Maxtor's diagnostic software reports that all drives are good?

TIA

Dave
Back to top
View user's profile Send private message
fvant
Guru
Guru


Joined: 08 Jun 2003
Posts: 328
Location: Leiden, The Netherlands

PostPosted: Thu Dec 23, 2004 4:13 pm    Post subject: Reply with quote

if things run smoothly with Knoppix but not with your homemade kernel, i'd have to conclude the problem lies with your kernel and drivers.

If you compare dmesg and lsmod output between Knoppix and your kernel, what are the differences ?
Back to top
View user's profile Send private message
davecorder
n00b
n00b


Joined: 03 Sep 2004
Posts: 10

PostPosted: Thu Dec 23, 2004 4:32 pm    Post subject: Reply with quote

fvant: I was just about to reach that conclusion myself, but I continued to test each drive with Maxtor's diagnostic software.

As it turns out, I have a failing drive. It didn't show up as failed to OS until several reboots and much mucking around with cables and drives and diagnostic software. But now Maxtor's software consistently reports it as defective and cannot effectively repair it (though it tries). Even my Knoppix 3.7 CD now shows the drive as failed.

Off to get a replacement 160 GB if I can, otherwise I'll toss in the 200 GB and call it good.

Next step: get SMART monitoring working, preferably with an email or even SMS alert to me when a drive starts failing.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum