Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Failed RAID1 device, cant login
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Fri Aug 22, 2008 5:58 pm    Post subject: Failed RAID1 device, cant login Reply with quote

I posted earlier about mounting a RAID1 with the live CD. I managed to get the mirror to start up with

Code:
mdadm --assemble /dev/hda3 /dev/hdb3


However, it appears that the hda3 partition is failed and hdb3 is listed as good, the only problem is we can't seem to get it to mount up and we get lots of superblock errors on hda3. Im dont have lots of experience with software mirroing and could use some help.

If I use the live cd, I can actually see files when I mount the drive but I can't chroot to it as it says the drive is degraded.
_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Fri Aug 22, 2008 6:25 pm    Post subject: Reply with quote

njcwotx,

When you assemble the raid set, your should get a /dev/mdX which is what you mount.
You may mount one part of a raid1 set as its underlying partition as long as you make it read only.

A read/write mount will get the two parts out of sync, so next time you try to assemble the raid set, it will start in degraded mode.
I'm not sure which part wins, the old original raid part or the altered part.

You should never operate on the underlying partitions of a kernel raid set, even if you can.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Fri Aug 22, 2008 6:44 pm    Post subject: Reply with quote

Ok here is what I have so far. While some of it is straight forward, its hard for me to tell the big picture here.

Code:
mdadm -A --update=resync --run /dev/hda3 /dev/hdb3



Code:
livecd / # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 hdb3[1]
      77587200 blocks [2/1] [_U]

unused devices: <none>


Code:
livecd / # mdadm --detail --scan /dev/md127
/dev/md127:
        Version : 00.90.03
  Creation Time : Fri May 28 11:56:18 2004
     Raid Level : raid1
     Array Size : 77587200 (73.99 GiB 79.45 GB)
  Used Dev Size : 77587200 (73.99 GiB 79.45 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 127
    Persistence : Superblock is persistent

    Update Time : Fri Aug 22 18:40:05 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 24df6bde:40d81b6a:b55d906c:1196c4e1
         Events : 0.192

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       3       67        1      active sync   /dev/hdb3


I tried to mount the /dev/md127 and get this

Code:
md: md127 stopped.
md: bind<hdb3>
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 mirrors
md: md127 stopped.
md: unbind<hdb3>
md: export_rdev(hdb3)
md: md127 stopped.
md: bind<hdb3>
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 mirrors
XFS: bad magic number
XFS: SB validate failed

_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Fri Aug 22, 2008 6:55 pm    Post subject: Reply with quote

njcwotx,

Let it complete the reconstruction. When thats done, it should be running on both drives again.
While the two halves are not synchronised, the raid is in degraded mode. You can still use it that way and the kernel will sort it out.

It may take several hours to rebuild as one drive has to be copied to the other and the bandwidth used for this process is deliberately limited, or you would not be able to use the volume while it was running.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Fri Aug 22, 2008 7:04 pm    Post subject: Reply with quote

OK cool, is there anyway I can check its rebuild progress?
_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Fri Aug 22, 2008 7:08 pm    Post subject: Reply with quote

njcwotx,

I think it appears in /proc/mdstat
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Fri Aug 22, 2008 7:13 pm    Post subject: Reply with quote

I posted the output of mdstat and it shows no progress..take a look at it and tell me what you think.

mdstat --detail shows its "removed"

From the output, which drive is failed? when I mdadm --detail /dev/hda3 it give info, if i mdadm --detail /dev/hdb3 it says its not an md device...

I am not sure its actually rebuilding, I do a mdstat --monitor /dev/md127 and see no action.

PS, lets assume that hda is physically failed, when we pull out the plug it wont activate the mirror. If we install a blank disk, can we rebuild it normally? I am not sure of how this will work.
_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Fri Aug 22, 2008 7:35 pm    Post subject: Reply with quote

njcwotx,

Your dmesg says (or said)
Code:
md: md127: raid array is not clean -- starting background reconstruction

I'm still on raidtools. I will need to update to mdadm one day, raidtools is long gone from portage.
From reading the mdadm man page, it has a command (near the bottom) that returns how far reconstruction has got.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Mad Merlin
Veteran
Veteran


Joined: 09 May 2005
Posts: 1155

PostPosted: Fri Aug 22, 2008 8:59 pm    Post subject: Reply with quote

Code:
cat /proc/mdstat
will indeed show the progress of the reconstruction, if it's taking place.
_________________
Game! - Where the stick is mightier than the sword!
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Fri Aug 22, 2008 9:42 pm    Post subject: Reply with quote

in that case, I dont think its reconstructing. Currently, I am cloning the mirrors into VMWare ESX server to have the ability to work with this more if I kill it. We have backups, but the original developers who made the application on it are not available permanently :) so I am preferring to get this mirror back.

some more questions:
1. Ok, so from the output, which mirror is failed? I looks like /dev/hda3 on the output of the /dev/md127, but when I use mdadm --scan /dev/hda3 I get info on a mirror and mdadm --scan /dev/hdb3 I get info its not part of a mirror set.

2. This is an old install of gentoo, done by some developers who are long gone. A replacement app has been on the wish list of my development group but a new one has not materialized yet. The original tools were raidtools and I dont have the startraid command. Could this be why the mirror wont mount up?

3. I have tried to remove the physical hda drive but I get a kernel panic, maybe im looking at the wrong drive...

The server is remote to me. I will be going out tommorrow in approx 18 hrs I might be physically at it and sometimes that is easier. Thanks for the input.
_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Sat Aug 23, 2008 11:18 am    Post subject: Reply with quote

njcwotx,

You posted
Code:
           UUID : 24df6bde:40d81b6a:b55d906c:1196c4e1
         Events : 0.192

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       3       67        1      active sync   /dev/hdb3

which shows that hdb3 is active.

Provided fdisk shows the partition types as fd, the kernel should start the raid set on boot. Your raid superblocks are persisteant.
dmesg will show something like
Code:
[    2.559974] md: considering sdb1 ...
[    2.561614] md:  adding sdb1 ...
[    2.563265] md:  adding sda1 ...
[    2.564896] md: created md0
[    2.566479] md: bind<sda1>
[    2.568023] md: bind<sdb1>
[    2.569521] md: running: <sdb1><sda1>
[    2.571078] raid1: raid set md0 active with 2 out of 2 mirrors
[    2.572665] md: ... autorun DONE.
which is my raid1 /boot being started. I guess you will have an error message there.

raidtools and mdadm should be interchangable, so missing raidtools is probably not the issue
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Mon Aug 25, 2008 2:32 am    Post subject: Reply with quote

Stranger now....

Ok, I got the system to boot up finally, now when I login, I see 2 separate raid sets!

The hda3 set is md0 which is actually the correct version, and the hdb3 now shows up as raid set md127. md127 is the label that came up when I tried to mount the raid from the boot cd, now it seems to think it supposed to stay that way. mdadm tool is not active on the original so, how to I go about telling it in raidtools world that I want to make the hdb3 partition forget about md127. the raidtab is still the original....when I boot to cd and do a mdadm --details /dev/hdb3 it says its preferred mirror is 127....

PS, I tried to see if mkraid was there and its not there either.

Is is just easier to wipe the hdb drive and let the mirror set fix itself? I really have pucker factor wiping one side of the mirror!

=====the stuff I see===========
Where the heck does it get the md127 from? I cant find any /etc config file with that in it, it must be stored somehow in the partition table? no executable commands I can find to modify this. Maybe something can be done from the boot disk side?
Code:
cat /proc/mdstat
Personalities : [raid1] [multipath]
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target0/lun0/part3[0]
      77587200 blocks [2/1] [U_]

md127 : active raid1 ide/host0/bus0/target1/lun0/part3[1]
      77587200 blocks [2/1] [_U]

unused devices: <none>


Code:
cat /etc/raidtab
raiddev /dev/md0
raid-level 1
persistent-superblock 1
nr-raid-disks 2
chunk-size 32
device /dev/hda3
raid-disk 0
device /dev/hdb3
raid-disk 1


Code:
dmesg |grep md
Kernel command line: root=/dev/md0
md: raid1 personality registered as nr 3
md: multipath personality registered as nr 7
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: considering ide/host0/bus0/target1/lun0/part3 ...
md:  adding ide/host0/bus0/target1/lun0/part3 ...
md: ide/host0/bus0/target0/lun0/part3 has same UUID as ide/host0/bus0/target1/lun0/part3, but superblocks differ ...
md: created md127
md: bind<ide/host0/bus0/target1/lun0/part3,1>
md: running: <ide/host0/bus0/target1/lun0/part3>
md: ide/host0/bus0/target1/lun0/part3's event counter: 000000d9
md: RAID level 1 does not need chunksize! Continuing anyway.
md127: max total readahead window set to 124k
md127: 1 data-disks, max readahead per data-disk: 124k
raid1: md127, not all disks are operational -- trying to recover array
raid1: raid set md127 active with 1 out of 2 mirrors
md: recovery thread got woken up ...
md127: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: updating md127 RAID superblock on device
md: ide/host0/bus0/target1/lun0/part3 [events: 000000da]<6>(write) ide/host0/bus0/target1/lun0/part3's sb offset: 77587200
md: considering ide/host0/bus0/target0/lun0/part3 ...
md:  adding ide/host0/bus0/target0/lun0/part3 ...
md: created md0
md: bind<ide/host0/bus0/target0/lun0/part3,1>
md: running: <ide/host0/bus0/target0/lun0/part3>
md: ide/host0/bus0/target0/lun0/part3's event counter: 00000072
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md127: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: updating md0 RAID superblock on device
md: ide/host0/bus0/target0/lun0/part3 [events: 00000073]<6>(write) ide/host0/bus0/target0/lun0/part3's sb offset: 77587200
md: ... autorun DONE.
md: swapper(pid 1) used obsolete MD ioctl, upgrade your software to use new ictls.
reiserfs: checking transaction log (device md(9,0)) ...
for (md(9,0))
md(9,0):Using r5 hash to sort names

_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Mon Aug 25, 2008 3:28 am    Post subject: Reply with quote

OK, I emerge mdadm and can now use those tools on the box...now to clear the md127 stuff but how...of to rtfm land.
_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
njcwotx
Guru
Guru


Joined: 25 Feb 2005
Posts: 587
Location: Texas

PostPosted: Mon Aug 25, 2008 3:47 am    Post subject: Reply with quote

Hot Diggity!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I had to emerge mdadm and after a lot of manpage reading and prayer I manged to get rid of md127, here are a few of my commands I pulled from history. Thanks for the help/confidence guys.
Code:
  548  mdadm --stop /dev/md127
  554  mdadm /dev/md0 -f /dev/hdb3
  556  mdadm --manage -r /dev/md127
  561  mdadm /dev/md0 --add /dev/hdb


Code:
 cat /proc/mdstat
Personalities : [raid1] [multipath]
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target1/lun0/disc[2] ide/host0/bus0/target0/lun0/part3[0]
      77587200 blocks [2/1] [U_]
      [>....................]  recovery =  0.3% (273728/77587200) finish=32.9min speed=39104K/sec
unused devices: <none>

_________________
Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum