View previous topic :: View next topic |
Author |
Message |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Fri Aug 22, 2008 5:58 pm Post subject: Failed RAID1 device, cant login |
|
|
I posted earlier about mounting a RAID1 with the live CD. I managed to get the mirror to start up with
Code: | mdadm --assemble /dev/hda3 /dev/hdb3 |
However, it appears that the hda3 partition is failed and hdb3 is listed as good, the only problem is we can't seem to get it to mount up and we get lots of superblock errors on hda3. Im dont have lots of experience with software mirroing and could use some help.
If I use the live cd, I can actually see files when I mount the drive but I can't chroot to it as it says the drive is degraded. _________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54831 Location: 56N 3W
|
Posted: Fri Aug 22, 2008 6:25 pm Post subject: |
|
|
njcwotx,
When you assemble the raid set, your should get a /dev/mdX which is what you mount.
You may mount one part of a raid1 set as its underlying partition as long as you make it read only.
A read/write mount will get the two parts out of sync, so next time you try to assemble the raid set, it will start in degraded mode.
I'm not sure which part wins, the old original raid part or the altered part.
You should never operate on the underlying partitions of a kernel raid set, even if you can. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Fri Aug 22, 2008 6:44 pm Post subject: |
|
|
Ok here is what I have so far. While some of it is straight forward, its hard for me to tell the big picture here.
Code: | mdadm -A --update=resync --run /dev/hda3 /dev/hdb3 |
Code: | livecd / # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 hdb3[1]
77587200 blocks [2/1] [_U]
unused devices: <none>
|
Code: | livecd / # mdadm --detail --scan /dev/md127
/dev/md127:
Version : 00.90.03
Creation Time : Fri May 28 11:56:18 2004
Raid Level : raid1
Array Size : 77587200 (73.99 GiB 79.45 GB)
Used Dev Size : 77587200 (73.99 GiB 79.45 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 127
Persistence : Superblock is persistent
Update Time : Fri Aug 22 18:40:05 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 24df6bde:40d81b6a:b55d906c:1196c4e1
Events : 0.192
Number Major Minor RaidDevice State
0 0 0 0 removed
1 3 67 1 active sync /dev/hdb3
|
I tried to mount the /dev/md127 and get this
Code: | md: md127 stopped.
md: bind<hdb3>
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 mirrors
md: md127 stopped.
md: unbind<hdb3>
md: export_rdev(hdb3)
md: md127 stopped.
md: bind<hdb3>
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 mirrors
XFS: bad magic number
XFS: SB validate failed
|
_________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54831 Location: 56N 3W
|
Posted: Fri Aug 22, 2008 6:55 pm Post subject: |
|
|
njcwotx,
Let it complete the reconstruction. When thats done, it should be running on both drives again.
While the two halves are not synchronised, the raid is in degraded mode. You can still use it that way and the kernel will sort it out.
It may take several hours to rebuild as one drive has to be copied to the other and the bandwidth used for this process is deliberately limited, or you would not be able to use the volume while it was running. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Fri Aug 22, 2008 7:04 pm Post subject: |
|
|
OK cool, is there anyway I can check its rebuild progress? _________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54831 Location: 56N 3W
|
Posted: Fri Aug 22, 2008 7:08 pm Post subject: |
|
|
njcwotx,
I think it appears in /proc/mdstat _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Fri Aug 22, 2008 7:13 pm Post subject: |
|
|
I posted the output of mdstat and it shows no progress..take a look at it and tell me what you think.
mdstat --detail shows its "removed"
From the output, which drive is failed? when I mdadm --detail /dev/hda3 it give info, if i mdadm --detail /dev/hdb3 it says its not an md device...
I am not sure its actually rebuilding, I do a mdstat --monitor /dev/md127 and see no action.
PS, lets assume that hda is physically failed, when we pull out the plug it wont activate the mirror. If we install a blank disk, can we rebuild it normally? I am not sure of how this will work. _________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54831 Location: 56N 3W
|
Posted: Fri Aug 22, 2008 7:35 pm Post subject: |
|
|
njcwotx,
Your dmesg says (or said) Code: | md: md127: raid array is not clean -- starting background reconstruction |
I'm still on raidtools. I will need to update to mdadm one day, raidtools is long gone from portage.
From reading the mdadm man page, it has a command (near the bottom) that returns how far reconstruction has got. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Mad Merlin Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
Joined: 09 May 2005 Posts: 1155
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Fri Aug 22, 2008 9:42 pm Post subject: |
|
|
in that case, I dont think its reconstructing. Currently, I am cloning the mirrors into VMWare ESX server to have the ability to work with this more if I kill it. We have backups, but the original developers who made the application on it are not available permanently so I am preferring to get this mirror back.
some more questions:
1. Ok, so from the output, which mirror is failed? I looks like /dev/hda3 on the output of the /dev/md127, but when I use mdadm --scan /dev/hda3 I get info on a mirror and mdadm --scan /dev/hdb3 I get info its not part of a mirror set.
2. This is an old install of gentoo, done by some developers who are long gone. A replacement app has been on the wish list of my development group but a new one has not materialized yet. The original tools were raidtools and I dont have the startraid command. Could this be why the mirror wont mount up?
3. I have tried to remove the physical hda drive but I get a kernel panic, maybe im looking at the wrong drive...
The server is remote to me. I will be going out tommorrow in approx 18 hrs I might be physically at it and sometimes that is easier. Thanks for the input. _________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54831 Location: 56N 3W
|
Posted: Sat Aug 23, 2008 11:18 am Post subject: |
|
|
njcwotx,
You posted Code: | UUID : 24df6bde:40d81b6a:b55d906c:1196c4e1
Events : 0.192
Number Major Minor RaidDevice State
0 0 0 0 removed
1 3 67 1 active sync /dev/hdb3 |
which shows that hdb3 is active.
Provided fdisk shows the partition types as fd, the kernel should start the raid set on boot. Your raid superblocks are persisteant.
dmesg will show something like Code: | [ 2.559974] md: considering sdb1 ...
[ 2.561614] md: adding sdb1 ...
[ 2.563265] md: adding sda1 ...
[ 2.564896] md: created md0
[ 2.566479] md: bind<sda1>
[ 2.568023] md: bind<sdb1>
[ 2.569521] md: running: <sdb1><sda1>
[ 2.571078] raid1: raid set md0 active with 2 out of 2 mirrors
[ 2.572665] md: ... autorun DONE. | which is my raid1 /boot being started. I guess you will have an error message there.
raidtools and mdadm should be interchangable, so missing raidtools is probably not the issue _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Mon Aug 25, 2008 2:32 am Post subject: |
|
|
Stranger now....
Ok, I got the system to boot up finally, now when I login, I see 2 separate raid sets!
The hda3 set is md0 which is actually the correct version, and the hdb3 now shows up as raid set md127. md127 is the label that came up when I tried to mount the raid from the boot cd, now it seems to think it supposed to stay that way. mdadm tool is not active on the original so, how to I go about telling it in raidtools world that I want to make the hdb3 partition forget about md127. the raidtab is still the original....when I boot to cd and do a mdadm --details /dev/hdb3 it says its preferred mirror is 127....
PS, I tried to see if mkraid was there and its not there either.
Is is just easier to wipe the hdb drive and let the mirror set fix itself? I really have pucker factor wiping one side of the mirror!
=====the stuff I see===========
Where the heck does it get the md127 from? I cant find any /etc config file with that in it, it must be stored somehow in the partition table? no executable commands I can find to modify this. Maybe something can be done from the boot disk side?
Code: | cat /proc/mdstat
Personalities : [raid1] [multipath]
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target0/lun0/part3[0]
77587200 blocks [2/1] [U_]
md127 : active raid1 ide/host0/bus0/target1/lun0/part3[1]
77587200 blocks [2/1] [_U]
unused devices: <none>
|
Code: | cat /etc/raidtab
raiddev /dev/md0
raid-level 1
persistent-superblock 1
nr-raid-disks 2
chunk-size 32
device /dev/hda3
raid-disk 0
device /dev/hdb3
raid-disk 1
|
Code: | dmesg |grep md
Kernel command line: root=/dev/md0
md: raid1 personality registered as nr 3
md: multipath personality registered as nr 7
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: considering ide/host0/bus0/target1/lun0/part3 ...
md: adding ide/host0/bus0/target1/lun0/part3 ...
md: ide/host0/bus0/target0/lun0/part3 has same UUID as ide/host0/bus0/target1/lun0/part3, but superblocks differ ...
md: created md127
md: bind<ide/host0/bus0/target1/lun0/part3,1>
md: running: <ide/host0/bus0/target1/lun0/part3>
md: ide/host0/bus0/target1/lun0/part3's event counter: 000000d9
md: RAID level 1 does not need chunksize! Continuing anyway.
md127: max total readahead window set to 124k
md127: 1 data-disks, max readahead per data-disk: 124k
raid1: md127, not all disks are operational -- trying to recover array
raid1: raid set md127 active with 1 out of 2 mirrors
md: recovery thread got woken up ...
md127: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: updating md127 RAID superblock on device
md: ide/host0/bus0/target1/lun0/part3 [events: 000000da]<6>(write) ide/host0/bus0/target1/lun0/part3's sb offset: 77587200
md: considering ide/host0/bus0/target0/lun0/part3 ...
md: adding ide/host0/bus0/target0/lun0/part3 ...
md: created md0
md: bind<ide/host0/bus0/target0/lun0/part3,1>
md: running: <ide/host0/bus0/target0/lun0/part3>
md: ide/host0/bus0/target0/lun0/part3's event counter: 00000072
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md127: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: updating md0 RAID superblock on device
md: ide/host0/bus0/target0/lun0/part3 [events: 00000073]<6>(write) ide/host0/bus0/target0/lun0/part3's sb offset: 77587200
md: ... autorun DONE.
md: swapper(pid 1) used obsolete MD ioctl, upgrade your software to use new ictls.
reiserfs: checking transaction log (device md(9,0)) ...
for (md(9,0))
md(9,0):Using r5 hash to sort names
|
_________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Mon Aug 25, 2008 3:28 am Post subject: |
|
|
OK, I emerge mdadm and can now use those tools on the box...now to clear the md127 stuff but how...of to rtfm land. _________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
njcwotx Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
![](images/avatars/2613710048d26fe72d3e4.png)
Joined: 25 Feb 2005 Posts: 587 Location: Texas
|
Posted: Mon Aug 25, 2008 3:47 am Post subject: |
|
|
Hot Diggity!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I had to emerge mdadm and after a lot of manpage reading and prayer I manged to get rid of md127, here are a few of my commands I pulled from history. Thanks for the help/confidence guys.
Code: | 548 mdadm --stop /dev/md127
554 mdadm /dev/md0 -f /dev/hdb3
556 mdadm --manage -r /dev/md127
561 mdadm /dev/md0 --add /dev/hdb |
Code: | cat /proc/mdstat
Personalities : [raid1] [multipath]
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target1/lun0/disc[2] ide/host0/bus0/target0/lun0/part3[0]
77587200 blocks [2/1] [U_]
[>....................] recovery = 0.3% (273728/77587200) finish=32.9min speed=39104K/sec
unused devices: <none>
|
_________________ Drinking from the fountain of knowldege.
Sometimes sipping.
Sometimes gulping.
Always thirsting. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|