Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
RAID 6 lost 3/7 drives, but 2 drives have no HW errors. rec?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
fLares
n00b
n00b


Joined: 05 May 2005
Posts: 15

PostPosted: Sat Nov 17, 2007 10:50 am    Post subject: RAID 6 lost 3/7 drives, but 2 drives have no HW errors. rec? Reply with quote

I used to have a RAID 6 (created with mdadm) consisting of 7 HDDs (5+2).
One disk (#7) died and I removed it.
I figured that there are still (5+1) disks left, so one spare still - and continued working with it until I can buy a new HDD. But then disaster struck and took out another drive (#6), /proc/mdstat told me there are now only 5 drives, so I had no reserve left.

I checked the disk that gave up last (#6) with SMART etc and it came out ok, so I figured it was maybe a software hickup and added it back in the RAID as a new drive (mdadm --add) and it started syncing. At about 2% the PC froze, upon reboot only 3 drives where in the RAID (#1,#2,#3).
I checked the missing drives (#4,#5) with mdadm -E and they where ok, superblocks and all. So I figured they where not in the RAID due to the crash before and I have to manually add them to the RAID for some reason.

Now the big mistake was to use --add for the first device (#5) I tried to get back in the RAID and looking at the RAID info, it was added as "spare".
Then strange things happened again with the PC and I checked for Hardware errors. Found that 2 IDE controllers where not behaving well any longer, probably causing all the trouble.

Now to get the data back I tried to copy every HD that was at one time part of the RAID to image files (with dd if=/dev/hdx of=/mnt/backup/hdx), so I would only have to use the on-board controller. So now I have 6 files which are images of the 6 HDDs that where formerly installed, 4 of which where pretty much untouched (#1,#2,#3,#4), one was added as a new drive that started syncing while the RAID was still active (#6) and one was added as spare while the RAID was inactive (#5).

Code:
"mdadm --assemble -f /dev/md0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6"


Trying to simply assemble the RAID from this fails with an IO-Error, using readonly gives no valid filesystem.

I figure that the data should well be there, since one drive was only 2% written on (the rest should still contain the old data from a time it was still in the RAID) and one drive was just added as spare (it was not written at all unless adding it as a spare deletes the contents). And I basically need only one of them to have 5 valid disks to start the RAID.

Now how can I recover at least some data?
Somehow telling the RAID to put the "spare" back in the place where it was before dropping out?
Recovering some data from the HDD that was in the RAID, dropped out and was added as a new HDD until it was about 2% in the resyncing process?

Hope someone can help me. I put quite some personal files on the RAID which are lost now.
I will try even desperate attempts since now I have about 2 Terrabyte of unused HDD space on the Disks that hold the old RAID data and I need that space soon... I can't affort a professional data recovery, so any suggestions are welcome at this point.

Many thanks
Aurora Glacialis

List of the Outputs of MDADM -E of all (backup) drives: (Important Note: The Number I gave above are not the actual numbers of the drives, so when I said "#6 was added", it does not mean that this drive is now "/dev/loop6" or "RaidDevice 6")

Code:

/dev/loop1:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
  Creation Time : Tue Apr 25 22:19:59 2006
     Raid Level : raid6
    Device Size : 156288256 (149.05 GiB 160.04 GB)
     Array Size : 781441280 (745.24 GiB 800.20 GB)
   Raid Devices : 7
  Total Devices : 6
Preferred Minor : 0

    Update Time : Wed Jul 18 23:09:39 2007
          State : active
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : f7ebc571 - correct
         Events : 0.18494993


      Number   Major   Minor   RaidDevice State
this     4      33        1        4      active sync

   0     0      22        1        0      active sync
   1     1      34       65        1      active sync
   2     2      56        1        2      active sync
   3     3       0        0        3      faulty removed
   4     4      33        1        4      active sync
   5     5       0        0        5      faulty removed
   6     6      57        1        6      active sync
   7     7      34        1        7      spare
/dev/loop2:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
  Creation Time : Tue Apr 25 22:19:59 2006
     Raid Level : raid6
    Device Size : 156288256 (149.05 GiB 160.04 GB)
     Array Size : 781441280 (745.24 GiB 800.20 GB)
   Raid Devices : 7
  Total Devices : 6
Preferred Minor : 0

    Update Time : Wed Jul 18 23:09:39 2007
          State : active
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : f7ebc58d - correct
         Events : 0.18494993


      Number   Major   Minor   RaidDevice State
this     6      57        1        6      active sync

   0     0      22        1        0      active sync
   1     1      34       65        1      active sync
   2     2      56        1        2      active sync
   3     3       0        0        3      faulty removed
   4     4      33        1        4      active sync
   5     5       0        0        5      faulty removed
   6     6      57        1        6      active sync
   7     7      34        1        7      spare
/dev/loop3:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
  Creation Time : Tue Apr 25 22:19:59 2006
     Raid Level : raid6
    Device Size : 156288256 (149.05 GiB 160.04 GB)
     Array Size : 781441280 (745.24 GiB 800.20 GB)
   Raid Devices : 7
  Total Devices : 6
Preferred Minor : 0

    Update Time : Wed Jul 18 23:09:39 2007
          State : active
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : f7ebc55e - correct
         Events : 0.18494993


      Number   Major   Minor   RaidDevice State
this     0      22        1        0      active sync

   0     0      22        1        0      active sync
   1     1      34       65        1      active sync
   2     2      56        1        2      active sync
   3     3       0        0        3      faulty removed
   4     4      33        1        4      active sync
   5     5       0        0        5      faulty removed
   6     6      57        1        6      active sync
   7     7      34        1        7      spare
/dev/loop4:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
  Creation Time : Tue Apr 25 22:19:59 2006
     Raid Level : raid6
    Device Size : 156288256 (149.05 GiB 160.04 GB)
     Array Size : 781441280 (745.24 GiB 800.20 GB)
   Raid Devices : 7
  Total Devices : 6
Preferred Minor : 0

    Update Time : Wed Jul 18 23:09:39 2007
          State : active
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : f7ebc584 - correct
         Events : 0.18494993


      Number   Major   Minor   RaidDevice State
this     2      56        1        2      active sync

   0     0      22        1        0      active sync
   1     1      34       65        1      active sync
   2     2      56        1        2      active sync
   3     3       0        0        3      faulty removed
   4     4      33        1        4      active sync
   5     5       0        0        5      faulty removed
   6     6      57        1        6      active sync
   7     7      34        1        7      spare
/dev/loop5:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
  Creation Time : Tue Apr 25 22:19:59 2006
     Raid Level : raid6
    Device Size : 156288256 (149.05 GiB 160.04 GB)
     Array Size : 781441280 (745.24 GiB 800.20 GB)
   Raid Devices : 7
  Total Devices : 6
Preferred Minor : 0

    Update Time : Wed Jul 18 23:09:39 2007
          State : active
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : f7ebc5ac - correct
         Events : 0.18494993


      Number   Major   Minor   RaidDevice State
this     1      34       65        1      active sync

   0     0      22        1        0      active sync
   1     1      34       65        1      active sync
   2     2      56        1        2      active sync
   3     3       0        0        3      faulty removed
   4     4      33        1        4      active sync
   5     5       0        0        5      faulty removed
   6     6      57        1        6      active sync
   7     7      34        1        7      spare
/dev/loop6:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : a0f103b5:f0b078ea:1714136b:10b0e30d
  Creation Time : Tue Apr 25 22:19:59 2006
     Raid Level : raid6
    Device Size : 156288256 (149.05 GiB 160.04 GB)
     Array Size : 781441280 (745.24 GiB 800.20 GB)
   Raid Devices : 7
  Total Devices : 6
Preferred Minor : 0

    Update Time : Wed Jul 18 23:09:39 2007
          State : active
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : f7ebc572 - correct
         Events : 0.18494993


      Number   Major   Minor   RaidDevice State
this     7      34        1        7      spare

   0     0      22        1        0      active sync
   1     1      34       65        1      active sync
   2     2      56        1        2      active sync
   3     3       0        0        3      faulty removed
   4     4      33        1        4      active sync
   5     5       0        0        5      faulty removed
   6     6      57        1        6      active sync
   7     7      34        1        7      spare


Code:

Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [raid6] [multipath] [faulty]
md0 : inactive loop3[0] loop6[7](S) loop2[6] loop1[4] loop4[2] loop5[1]
      937729536 blocks
       
unused devices: <none>
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54815
Location: 56N 3W

PostPosted: Sat Nov 17, 2007 12:52 pm    Post subject: Reply with quote

fLares,

The bad news is that you may not get any data back but to help you work with the disk images we need to know.

1) how the drives in the raid set were partitioned, if at all.
2) how the images were made. (The exact commands)
3) the command you used to attach the images to /dev/loopX
There are a lot of pitfalls in those steps
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
fLares
n00b
n00b


Joined: 05 May 2005
Posts: 15

PostPosted: Mon Nov 26, 2007 11:21 pm    Post subject: Reply with quote

Hi.

1) The drives where originally partitioned only to have 1 large partition each. They were all the same size, 160GB. So I had /dev/hda1 /dev/hdc1, etc...
2) The images where made with dd if=/dev/hda1 of=/backup/hda etc
3) I used losetup /backup/hda /dev/loop1 etc

I still have the original hard disks though, just in case. (Can't mount them all though, since I have lost 2 IDE Controller cards to hardware failures. Most likely it is the mainboard that causes the problems though, so the controllers may work in a different system)

Many Thanks
Aurora
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54815
Location: 56N 3W

PostPosted: Tue Nov 27, 2007 7:52 pm    Post subject: Reply with quote

fLares,

I know what to try but I don't know where the information is you need to change.
You need to hack the metadata to make one of the failed (or spare) images appear good, even if its not.

The find out what you need to do to attempt that, you need to read the mdadm, or possibly the kernel raid code.

Lets suppose for a moment that this data is held within the raid section of your raid set. You raid is broken and cannot be read (as raid) therefore it follows that it must be in an area on each drive outside the raided area. Further, if mdadm can tell you the status of your raid, you can get at it to change it.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
fLares
n00b
n00b


Joined: 05 May 2005
Posts: 15

PostPosted: Tue Jan 22, 2008 2:38 pm    Post subject: Reply with quote

I found a tool named mddump. Supossedly it can change the superblocks of a md-RAID drive. I tried to use it, can red the superblocks, but cant write them back when changed (the checksum does not work). Can this tool be used to help me? Or any other suggestions? Looking into the kernel source code is out of the question since I myself don't understand it well enough and a friend who could do it would need quite some time to get into it, which he is only willing to to if it would be extremely important, which is not neccecarily the case here. The lost data is not critical, it's mostly old letters, images, some mp3s and copies of websites I once made that are offline (and now lost).

However, I need those extra disk space soon for backups 8O - so I give it about 2 Weeks until I need to have a solution for this... better loos the old data than risk the current data by not performing backups.

Greetings
fLares
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum