View previous topic :: View next topic |
Author |
Message |
fLares n00b
Joined: 05 May 2005 Posts: 15
|
Posted: Sat Nov 17, 2007 10:50 am Post subject: RAID 6 lost 3/7 drives, but 2 drives have no HW errors. rec? |
|
|
I used to have a RAID 6 (created with mdadm) consisting of 7 HDDs (5+2).
One disk (#7) died and I removed it.
I figured that there are still (5+1) disks left, so one spare still - and continued working with it until I can buy a new HDD. But then disaster struck and took out another drive (#6), /proc/mdstat told me there are now only 5 drives, so I had no reserve left.
I checked the disk that gave up last (#6) with SMART etc and it came out ok, so I figured it was maybe a software hickup and added it back in the RAID as a new drive (mdadm --add) and it started syncing. At about 2% the PC froze, upon reboot only 3 drives where in the RAID (#1,#2,#3).
I checked the missing drives (#4,#5) with mdadm -E and they where ok, superblocks and all. So I figured they where not in the RAID due to the crash before and I have to manually add them to the RAID for some reason.
Now the big mistake was to use --add for the first device (#5) I tried to get back in the RAID and looking at the RAID info, it was added as "spare".
Then strange things happened again with the PC and I checked for Hardware errors. Found that 2 IDE controllers where not behaving well any longer, probably causing all the trouble.
Now to get the data back I tried to copy every HD that was at one time part of the RAID to image files (with dd if=/dev/hdx of=/mnt/backup/hdx), so I would only have to use the on-board controller. So now I have 6 files which are images of the 6 HDDs that where formerly installed, 4 of which where pretty much untouched (#1,#2,#3,#4), one was added as a new drive that started syncing while the RAID was still active (#6) and one was added as spare while the RAID was inactive (#5).
Code: | "mdadm --assemble -f /dev/md0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6" |
Trying to simply assemble the RAID from this fails with an IO-Error, using readonly gives no valid filesystem.
I figure that the data should well be there, since one drive was only 2% written on (the rest should still contain the old data from a time it was still in the RAID) and one drive was just added as spare (it was not written at all unless adding it as a spare deletes the contents). And I basically need only one of them to have 5 valid disks to start the RAID.
Now how can I recover at least some data?
Somehow telling the RAID to put the "spare" back in the place where it was before dropping out?
Recovering some data from the HDD that was in the RAID, dropped out and was added as a new HDD until it was about 2% in the resyncing process?
Hope someone can help me. I put quite some personal files on the RAID which are lost now.
I will try even desperate attempts since now I have about 2 Terrabyte of unused HDD space on the Disks that hold the old RAID data and I need that space soon... I can't affort a professional data recovery, so any suggestions are welcome at this point.
Many thanks
Aurora Glacialis
List of the Outputs of MDADM -E of all (backup) drives: (Important Note: The Number I gave above are not the actual numbers of the drives, so when I said "#6 was added", it does not mean that this drive is now "/dev/loop6" or "RaidDevice 6")
Code: |
/dev/loop1:
Magic : a92b4efc
Version : 00.90.02
UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
Creation Time : Tue Apr 25 22:19:59 2006
Raid Level : raid6
Device Size : 156288256 (149.05 GiB 160.04 GB)
Array Size : 781441280 (745.24 GiB 800.20 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 0
Update Time : Wed Jul 18 23:09:39 2007
State : active
Active Devices : 5
Working Devices : 6
Failed Devices : 2
Spare Devices : 1
Checksum : f7ebc571 - correct
Events : 0.18494993
Number Major Minor RaidDevice State
this 4 33 1 4 active sync
0 0 22 1 0 active sync
1 1 34 65 1 active sync
2 2 56 1 2 active sync
3 3 0 0 3 faulty removed
4 4 33 1 4 active sync
5 5 0 0 5 faulty removed
6 6 57 1 6 active sync
7 7 34 1 7 spare
/dev/loop2:
Magic : a92b4efc
Version : 00.90.02
UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
Creation Time : Tue Apr 25 22:19:59 2006
Raid Level : raid6
Device Size : 156288256 (149.05 GiB 160.04 GB)
Array Size : 781441280 (745.24 GiB 800.20 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 0
Update Time : Wed Jul 18 23:09:39 2007
State : active
Active Devices : 5
Working Devices : 6
Failed Devices : 2
Spare Devices : 1
Checksum : f7ebc58d - correct
Events : 0.18494993
Number Major Minor RaidDevice State
this 6 57 1 6 active sync
0 0 22 1 0 active sync
1 1 34 65 1 active sync
2 2 56 1 2 active sync
3 3 0 0 3 faulty removed
4 4 33 1 4 active sync
5 5 0 0 5 faulty removed
6 6 57 1 6 active sync
7 7 34 1 7 spare
/dev/loop3:
Magic : a92b4efc
Version : 00.90.02
UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
Creation Time : Tue Apr 25 22:19:59 2006
Raid Level : raid6
Device Size : 156288256 (149.05 GiB 160.04 GB)
Array Size : 781441280 (745.24 GiB 800.20 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 0
Update Time : Wed Jul 18 23:09:39 2007
State : active
Active Devices : 5
Working Devices : 6
Failed Devices : 2
Spare Devices : 1
Checksum : f7ebc55e - correct
Events : 0.18494993
Number Major Minor RaidDevice State
this 0 22 1 0 active sync
0 0 22 1 0 active sync
1 1 34 65 1 active sync
2 2 56 1 2 active sync
3 3 0 0 3 faulty removed
4 4 33 1 4 active sync
5 5 0 0 5 faulty removed
6 6 57 1 6 active sync
7 7 34 1 7 spare
/dev/loop4:
Magic : a92b4efc
Version : 00.90.02
UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
Creation Time : Tue Apr 25 22:19:59 2006
Raid Level : raid6
Device Size : 156288256 (149.05 GiB 160.04 GB)
Array Size : 781441280 (745.24 GiB 800.20 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 0
Update Time : Wed Jul 18 23:09:39 2007
State : active
Active Devices : 5
Working Devices : 6
Failed Devices : 2
Spare Devices : 1
Checksum : f7ebc584 - correct
Events : 0.18494993
Number Major Minor RaidDevice State
this 2 56 1 2 active sync
0 0 22 1 0 active sync
1 1 34 65 1 active sync
2 2 56 1 2 active sync
3 3 0 0 3 faulty removed
4 4 33 1 4 active sync
5 5 0 0 5 faulty removed
6 6 57 1 6 active sync
7 7 34 1 7 spare
/dev/loop5:
Magic : a92b4efc
Version : 00.90.02
UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
Creation Time : Tue Apr 25 22:19:59 2006
Raid Level : raid6
Device Size : 156288256 (149.05 GiB 160.04 GB)
Array Size : 781441280 (745.24 GiB 800.20 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 0
Update Time : Wed Jul 18 23:09:39 2007
State : active
Active Devices : 5
Working Devices : 6
Failed Devices : 2
Spare Devices : 1
Checksum : f7ebc5ac - correct
Events : 0.18494993
Number Major Minor RaidDevice State
this 1 34 65 1 active sync
0 0 22 1 0 active sync
1 1 34 65 1 active sync
2 2 56 1 2 active sync
3 3 0 0 3 faulty removed
4 4 33 1 4 active sync
5 5 0 0 5 faulty removed
6 6 57 1 6 active sync
7 7 34 1 7 spare
/dev/loop6:
Magic : a92b4efc
Version : 00.90.02
UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
Creation Time : Tue Apr 25 22:19:59 2006
Raid Level : raid6
Device Size : 156288256 (149.05 GiB 160.04 GB)
Array Size : 781441280 (745.24 GiB 800.20 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 0
Update Time : Wed Jul 18 23:09:39 2007
State : active
Active Devices : 5
Working Devices : 6
Failed Devices : 2
Spare Devices : 1
Checksum : f7ebc572 - correct
Events : 0.18494993
Number Major Minor RaidDevice State
this 7 34 1 7 spare
0 0 22 1 0 active sync
1 1 34 65 1 active sync
2 2 56 1 2 active sync
3 3 0 0 3 faulty removed
4 4 33 1 4 active sync
5 5 0 0 5 faulty removed
6 6 57 1 6 active sync
7 7 34 1 7 spare
|
Code: |
Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [raid6] [multipath] [faulty]
md0 : inactive loop3[0] loop6[7](S) loop2[6] loop1[4] loop4[2] loop5[1]
937729536 blocks
unused devices: <none>
|
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54815 Location: 56N 3W
|
Posted: Sat Nov 17, 2007 12:52 pm Post subject: |
|
|
fLares,
The bad news is that you may not get any data back but to help you work with the disk images we need to know.
1) how the drives in the raid set were partitioned, if at all.
2) how the images were made. (The exact commands)
3) the command you used to attach the images to /dev/loopX
There are a lot of pitfalls in those steps _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
fLares n00b
Joined: 05 May 2005 Posts: 15
|
Posted: Mon Nov 26, 2007 11:21 pm Post subject: |
|
|
Hi.
1) The drives where originally partitioned only to have 1 large partition each. They were all the same size, 160GB. So I had /dev/hda1 /dev/hdc1, etc...
2) The images where made with dd if=/dev/hda1 of=/backup/hda etc
3) I used losetup /backup/hda /dev/loop1 etc
I still have the original hard disks though, just in case. (Can't mount them all though, since I have lost 2 IDE Controller cards to hardware failures. Most likely it is the mainboard that causes the problems though, so the controllers may work in a different system)
Many Thanks
Aurora |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54815 Location: 56N 3W
|
Posted: Tue Nov 27, 2007 7:52 pm Post subject: |
|
|
fLares,
I know what to try but I don't know where the information is you need to change.
You need to hack the metadata to make one of the failed (or spare) images appear good, even if its not.
The find out what you need to do to attempt that, you need to read the mdadm, or possibly the kernel raid code.
Lets suppose for a moment that this data is held within the raid section of your raid set. You raid is broken and cannot be read (as raid) therefore it follows that it must be in an area on each drive outside the raided area. Further, if mdadm can tell you the status of your raid, you can get at it to change it. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
fLares n00b
Joined: 05 May 2005 Posts: 15
|
Posted: Tue Jan 22, 2008 2:38 pm Post subject: |
|
|
I found a tool named mddump. Supossedly it can change the superblocks of a md-RAID drive. I tried to use it, can red the superblocks, but cant write them back when changed (the checksum does not work). Can this tool be used to help me? Or any other suggestions? Looking into the kernel source code is out of the question since I myself don't understand it well enough and a friend who could do it would need quite some time to get into it, which he is only willing to to if it would be extremely important, which is not neccecarily the case here. The lost data is not critical, it's mostly old letters, images, some mp3s and copies of websites I once made that are offline (and now lost).
However, I need those extra disk space soon for backups - so I give it about 2 Weeks until I need to have a solution for this... better loos the old data than risk the current data by not performing backups.
Greetings
fLares |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|