View previous topic :: View next topic |
Author |
Message |
electroarts Tux's lil' helper
Joined: 03 Jan 2004 Posts: 86
|
Posted: Tue Jun 06, 2006 4:05 pm Post subject: Software RAID 1 broke down, how can I save data? |
|
|
I've been using a software mirroring RAID 1 for my server. One of the disks seems to have died, because all activity on the server from Mar 2 until today when we rebooted it (including all logs and history etc--and that was a LOT of activity!) has disappeared. I'd like to try rebuilding the data once I get a new drive in there, but I'm unsure where to start.
Add to that the fact that the server has been back online now, and so data has been written to the active disk in the mirror. Will this pretty much ruin my chances of recovering the lost data?
Some questions:
How can I tell which drive is corrupted? I used mkraid and not mdadm to create the array, BTW.
If I replace the bad drive, is it possible to recover the lost data? Even if new data has been written to the drive?
Some output:
Code: | # dmesg
md: Autodetecting RAID arrays.
md: autorun ...
md: considering hdc7 ...
md: adding hdc7 ...
md: hdc6 has different UUID to hdc7
md: hdc5 has different UUID to hdc7
md: hdc3 has different UUID to hdc7
md: hdc1 has different UUID to hdc7
md: created md5
md: bind<hdc7>
md: running: <hdc7>
raid1: raid set md5 active with 1 out of 2 mirrors
md: considering hdc6 ...
md: adding hdc6 ...
md: hdc5 has different UUID to hdc6
md: hdc3 has different UUID to hdc6
md: hdc1 has different UUID to hdc6
md: created md4
md: bind<hdc6>
md: running: <hdc6>
raid1: raid set md4 active with 1 out of 2 mirrors
md: considering hdc5 ...
md: adding hdc5 ...
md: hdc3 has different UUID to hdc5
md: hdc1 has different UUID to hdc5
md: created md3
md: bind<hdc5>
md: running: <hdc5>
raid1: raid set md3 active with 1 out of 2 mirrors
md: considering hdc3 ...
md: adding hdc3 ...
md: hdc1 has different UUID to hdc3
md: created md2
md: bind<hdc3>
md: running: <hdc3>
raid1: raid set md2 active with 1 out of 2 mirrors
md: considering hdc1 ...
md: adding hdc1 ...
md: created md0
md: bind<hdc1>
md: running: <hdc1>
raid1: raid set md0 active with 1 out of 2 mirrors
md: ... autorun DONE.
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hdc3[1]
5863616 blocks [2/1] [_U]
md3 : active raid1 hdc5[1]
2939776 blocks [2/1] [_U]
md4 : active raid1 hdc6[1]
1959808 blocks [2/1] [_U]
md5 : active raid1 hdc7[1]
144432256 blocks [2/1] [_U]
md0 : active raid1 hdc1[1]
104320 blocks [2/1] [_U] |
|
|
Back to top |
|
|
nielchiano Veteran
Joined: 11 Nov 2003 Posts: 1287 Location: 50N 3E
|
Posted: Wed Jun 07, 2006 12:59 pm Post subject: Re: Software RAID 1 broke down, how can I save data? |
|
|
electroarts wrote: | I've been using a software mirroring RAID 1 for my server. One of the disks seems to have died, because all activity on the server from Mar 2 until today when we rebooted it (including all logs and history etc--and that was a LOT of activity!) has disappeared. |
This doesn't make sense... in a RAID-1 you don't loose data because of a failed disk... That's the whole point in RAID-1!
If you did loose data, you were either (a) not using RAID, (b) had a very strange problem like switching "active" disks from 1 set to the other..
electroarts wrote: | I'd like to try rebuilding the data once I get a new drive in there, but I'm unsure where to start. |
http://www.tldp.org/HOWTO/Software-RAID-HOWTO-6.html will get you started
electroarts wrote: | Add to that the fact that the server has been back online now, and so data has been written to the active disk in the mirror. Will this pretty much ruin my chances of recovering the lost data? |
Again: this has nothing to do with RAID being in a degenerated state.
And yes, once you start OVERwriting data, you lose what was under it.
electroarts wrote: | Some questions:
How can I tell which drive is corrupted? I used mkraid and not mdadm to create the array, BTW. |
See url above
electroarts wrote: | If I replace the bad drive, is it possible to recover the lost data? Even if new data has been written to the drive? |
maybe... but again: I don't think this has anything to do with RAID. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2093 Location: San Jose, CA
|
Posted: Sat Jul 08, 2006 8:22 am Post subject: |
|
|
It funny, I've just suffered my second 160GB raid1 failure.
I came here to confirm that the "(F)" meant fail, I found a thread that taks about mdadm so I emerged mdadm and confirmed the failure.
here is the output of cat /proc/mdstat:
Code: | server ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid0 hdg1[1] hde1[0]
390716672 blocks 32k chunks
md0 : active raid1 hdb2[2](F) hda2[0]
155782208 blocks [2/1] [U_] |
Notice it lists two drives for each raid. A raid 0 with hdg and hde and a raid1 with hda and hdb.
hdb is dead.
Here's the output of mdadm:
Code: | server etc # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.02
Creation Time : Sun Jun 5 05:39:18 2005
Raid Level : raid1
Array Size : 155782208 (148.57 GiB 159.52 GB)
Device Size : 155782208 (148.57 GiB 159.52 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat Jul 8 01:24:43 2006
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
UUID : a39e33a0:9f5e993f:0998748d:fb147598
Events : 0.22149265
Number Major Minor RaidDevice State
0 3 2 0 active sync /dev/hda2
1 0 0 1 removed
2 3 66 - faulty spare /dev/hdb2 |
Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.
I wonder if the Samsung (who's sister died last year) or the Seagate (the sister's replacement) died.
I'll find out tomorrow when I replace it with some other brand.
Here's the point of my post: from your /proc/mdstat it looks like you only had one drive for your raid1. So something's not right...
Can you post your /etc/raidtab?
Raydude _________________ Some day there will only be free software. |
|
Back to top |
|
|
bludger Guru
Joined: 09 Apr 2003 Posts: 389
|
Posted: Fri Jul 14, 2006 8:30 pm Post subject: |
|
|
RayDude wrote: |
Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.
|
I have a similar problem. This only shows one partition as being faulty. Does this mean that the whole disk is faulty, or that it can somehow be repaired? |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2093 Location: San Jose, CA
|
Posted: Sun Jul 16, 2006 8:02 am Post subject: |
|
|
bludger wrote: | RayDude wrote: |
Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.
|
I have a similar problem. This only shows one partition as being faulty. Does this mean that the whole disk is faulty, or that it can somehow be repaired? |
Is this a Raid1?
A raid1 on two partitions of the same drive probably won't help keep the data safe. Its possible that data on one partition would go bad while the other is okay but more than likely the whole drive would die.
A raid1 should be on two disks of the same size or two partitions on two disks of the same size. And I'm under the belief that making it two different brands may be a good idea too, just in case there is a manufacturing defect with both drives that cause them to fail at approximatelyl the same time. Its probably more superstition than anything, failures are rare.
Raydude _________________ Some day there will only be free software. |
|
Back to top |
|
|
bludger Guru
Joined: 09 Apr 2003 Posts: 389
|
Posted: Mon Jul 17, 2006 8:50 am Post subject: |
|
|
RayDude wrote: | bludger wrote: | RayDude wrote: |
Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.
|
I have a similar problem. This only shows one partition as being faulty. Does this mean that the whole disk is faulty, or that it can somehow be repaired? |
Is this a Raid1?
A raid1 on two partitions of the same drive probably won't help keep the data safe. Its possible that data on one partition would go bad while the other is okay but more than likely the whole drive would die.
A raid1 should be on two disks of the same size or two partitions on two disks of the same size. And I'm under the belief that making it two different brands may be a good idea too, just in case there is a manufacturing defect with both drives that cause them to fail at approximatelyl the same time. Its probably more superstition than anything, failures are rare.
Raydude |
Yes it is a RAID 1 and was set up between identical partitions on two disks of the same size. Apparently AFAIK you can't set up a RAID between raw disks with the current linux kernel yet, although I could be wrong there. It just seemed strange to me that only one partition was shown by mdadm as being faulty, although all 4 partition-pairs were set up. |
|
Back to top |
|
|
|