raid device invalid after boot... ?

FuzzyOne · n00b Joined: 08 Mar 2004 Posts: 21

2 SATA drives as RAID1 on ASUSP4C800 (Promise), gentoo 2004.0, 2.6.3-gentoo-r1 kernel with RAID support compiled in. manual creation of raid is ok:

handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sdb1, 245111706kB, raid superblock at 245111616kB
disk 1: /dev/sdc1, 245111706kB, raid superblock at 245111616kB

cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdc1[1] sdb1[0]
245111616 blocks [2/2] [UU]
[>....................] resync = 1.4% (3450672/245111616) finish=410.4min speed=9811K/sec

manual mounting and disk access is ok. but after i reboot the raid is invalid:

autodetecting RAID... trying md0: invalid
/dev/md0 is not a RAID0 or LINEAR

(which is weird because it's configured as RAID1)

and when i try to access it:
/dev/md0: Invalid argument
mount: /dev/md0: can't read superblock

but i can recreate the raid manually (mkraid) just fine without any data loss.

/etc/raidtab:
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 32
persistent-superblock 1
device /dev/sdb1
raid-disk 0
device /dev/sdc1
raid-disk 1

what am i missing?

toofastforyahuh · Apprentice Joined: 18 May 2004 Posts: 172

I have almost the same problem on ASUS SK8V, 2.6.6 vanilla and 2.6.5-gentoo-r1 on amd64. I also have RAID compiled in and can build and manually mount the RAID1 just fine, but bootup fails in the init process.

Specifically the init script brings me to the error (type password or control D) prompt when raidstart fails.

raidstart does not like /dev/md0 during init. raidstart -a produces "/dev/md0: Invalid argument". I can do a raidstop /dev/md0 OK, but I cannot subsequently to a raidstart -a.

If I mv /etc/raidtab /etc/raidtab.old I can eventually get through the boot process. At this point if I login and mv /etc/raidtab.old /etc/raidtab and try raidstart -a it works and I can mount it manually!

What's wrong with the init process? Thanks!

toofastforyahuh · Apprentice Joined: 18 May 2004 Posts: 172

I should also note that this is a RAID 1 array with 2 SATA drives on the Promise controller. The array is *not* defined in BIOS because this is software RAID. Also mkraid needed the --really-force option to work, but I had no other problems during the procedure. I followed the TLDP software RAID howto and also this howto:
[url]http://www.siliconvalleyccie.com/linux-adv/raid.htm
[/url]

I even used -c -c during the ext3 formatting to make sure the drives worked OK.

The drives appear as /dev/sdc1 and /dev/sdd1. It's strange because this appears to work manually but the init scripts die on it.

I should note that the log says things like:
md: raidstart(pid 7965) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6

Basically the init scripts call raidstart /dev/md0 during boot but for some reason it fails then. with:
/dev/md0: Invalid argument

However, if I disable RAID during boot and then restore my /etc/raidtab and /etc/fstab afterward and raidstart manually it works perfectly. What gives?

senter · n00b Joined: 23 May 2004 Posts: 1 Location: Sweden

have you set the partition type to fd, Linux raid autodetect, on both partitions ?

toofastforyahuh · Apprentice Joined: 18 May 2004 Posts: 172

Donny · n00b Joined: 06 Jun 2004 Posts: 59

I have exact the same "errors" as FuzzyOne and toofastforyahuh on an ASUS K8V SE De luxe.
The raid works fine (installed it following the documents from tldp)
But after boot it fails on the /dev/md0

When I comment #raiddev /dev/md0 out in /etc/raidtab it boots without a problem exept no raid

So after starting the raid manual again all is working fine.

Does anyone no what I miss or doing wrong?
genkernel -> 2.6.3-gentoo-r2
2004.0
raidtab is the same as FuzzyOne
error (type password or control D) prompt when raidstart fails on boot

toofastforyahuh · Apprentice Joined: 18 May 2004 Posts: 172

I still have this problem.
Did anything about software raid change in 2.6.x? Do the raidtools need updating?
Is it OK to have /dev/md0 instead of /dev/md/0?

I'm really at a loss here.

Donny · n00b Joined: 06 Jun 2004 Posts: 59

I used mdadm and not raidtools, but got same error.
The feeling I have is that it has something to do with /dev/md0 but what I have really no idea I am lost.

Donny · n00b Joined: 06 Jun 2004 Posts: 59

Has anyone an idea in what direction to search to solve this problem?

toofastforyahuh · Apprentice Joined: 18 May 2004 Posts: 172

I think I fixed it!

First let me preface this with I find the gentoo init process confusing compared to ye olde /etc/rc.d/rc3.d, and when combined with the black magic of devfs/sysfs/every_other_fs I just get confused to no end. I have no idea when devices are available and when/where they are called and when/where they get symlinked, etc. And therein lies the problem.

It appears the scripts were trying to initialize the RAID before my SCSI devices were even set up!

From dmesg:

Donny · n00b Joined: 06 Jun 2004 Posts: 59

Glad you fixed it toofastforyahuh

I fixed mine too by loading the modules "raid0 and md " on boot.
hopes it helps someone.

toofastforyahuh · Apprentice Joined: 18 May 2004 Posts: 172

Somehow not long after my last post in this thread I was able to get the /dev/sdb1 /dev/sdc1 symlinks to work. I honestly don't remember how. Probably some more udev nonsense.

Then for months my software RAID worked great--or so I thought. Except for whatever reason the second drive was not being added by the kernel at boot. I still don't know how that happened either, since both drives were added fine when the RAID was first set up.

The solution in this case was to add the missing drive again with radihotadd, and now the RAID appears to work correctly with both drives even after a reboot. (They are brand new drives and no, it was not a drive failure.)

It's amazing how complicated this seemingly simple task of setting up md is. Just emerge the raidtools (or mdadm), set up the /etc/raidtab, load the kernel modules, make the RAID, and it should just work.....but there's always some gremlin throwing a wrench into the works.

blais · n00b Joined: 30 Jul 2003 Posts: 57

hi

i have the very same same problem using a P4P800 with two 120GB IDE drives and RAID0. I won't bother repeating my logfiles, they're the same as toofastforyahuh

this is then probably not a SCSI issue.

i don't have a solution for it. when i boot from the raid (/dev/md0 as /) it stops with an error and i have / mounted in read-only mode and the maintenance prompt at the console.

could it be that I have to let the drives "resync" before mounting?

blais · n00b Joined: 30 Jul 2003 Posts: 57

oops i mean RAID-1 in my message above.
I used to have these in RAID-0 and it worked fine.
Now the problem started when I switched to RAID-1 (and yes, i did recreate the fs)

MagicITX · n00b Joined: 08 Feb 2005 Posts: 6

Can anyone help with this? My setup is different but the problem is the same. I have:

/dev/md1 RAID-1 with 2 drives
/dev/md2 RAID-1 with 2 drives
/dev/md0 RAID-0 with /dev/md1 and /dev/md2

md1 and md2 start fine but init fails at md0.

From the recovery shell I see that dmesg ends with "md: md0 stopped". If I run 'mdadm -As /dev/md0' it tells me 'mdadm: no devices found for /dev/md0'.

I can recreate the array with:

mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/md1 /dev/md2

The message I get here is a little weird. It says:

mdadm: /dev/md1 appears to be part of a raid array:
level=0 devices=2 ctime=Sat Mar 19 05:56:33 2005
mdadm: /dev/md2 appears to be part of a raid array:
level=0 devices=2 ctime=Sat Mar 19 05:56:33 2005
Continue creating array?

I answer "y"es and it says 'mdadm: array /dev/md0 started.' What makes this weird is the md1 and md2 arrays are level 1, not level 0 as reported.

At this point if I cat /proc/mdstat I get:

Personalities : [raid0][raid1]
md0 : active raid0 md2[1] md1[0]
586066944 blocks 64k chunks
md2 : active raid1 sdc1[0] sdd1[1]
293033536 blocks [2/2] [UU]
md1 : active raid1 sda1[0] sdb1[1]
293033536 blocks [2/2] [UU]

My /etc/mdadm.conf file has:

DEVICE /dev/sda1
DEVICE /dev/sdb1
DEVICE /dev/sdc1
DEVICE /dev/sdd1
ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md2 devices=/dev/sdc1,/dev/sdd1
ARRAY /dev/md0 devices=/dev/md1,/dev/md2

So like the other posts in this thread, I have a RAID array that won't start during init or afterward but can be recreated from the recovery shell.

Any ideas?
_________________
Maximizing the mini-itx - magicitx.com

MagicITX · n00b Joined: 08 Feb 2005 Posts: 6

Some other posts suggested this could be due to udev. That isn't the problem in my case. I get the same error with devfs.
_________________
Maximizing the mini-itx - magicitx.com

Phk · Posted: Sun Mar 20, 2005 7:23 pm Post subject:

MagicITX · n00b Joined: 08 Feb 2005 Posts: 6

Thanks for the feedback. My problem was a little different but I've been able to solve it.

The problem was in /etc/mdadm.conf. When mdadm starts up an array it looks through mdadm.conf for the devices to use. Previously my file contained this:

Phk · Posted: Sun Mar 20, 2005 9:53 pm Post subject:

Yeah, it makes sense

Glad you worked it out!! I'm still in a mess....

If you want to know\help, visit my issues page..... ----> HERE

I'm posting the new problem in 30 minutes or so.

See us!
_________________
"# cat /dev/urandom >> /tmp/life"