Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Software RAID 1 broke down, how can I save data?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
electroarts
Tux's lil' helper
Tux's lil' helper


Joined: 03 Jan 2004
Posts: 86

PostPosted: Tue Jun 06, 2006 4:05 pm    Post subject: Software RAID 1 broke down, how can I save data? Reply with quote

I've been using a software mirroring RAID 1 for my server. One of the disks seems to have died, because all activity on the server from Mar 2 until today when we rebooted it (including all logs and history etc--and that was a LOT of activity!) has disappeared. I'd like to try rebuilding the data once I get a new drive in there, but I'm unsure where to start.

Add to that the fact that the server has been back online now, and so data has been written to the active disk in the mirror. Will this pretty much ruin my chances of recovering the lost data?

Some questions:

How can I tell which drive is corrupted? I used mkraid and not mdadm to create the array, BTW.

If I replace the bad drive, is it possible to recover the lost data? Even if new data has been written to the drive?

Some output:

Code:
# dmesg
md: Autodetecting RAID arrays.
md: autorun ...
md: considering hdc7 ...
md:  adding hdc7 ...
md: hdc6 has different UUID to hdc7
md: hdc5 has different UUID to hdc7
md: hdc3 has different UUID to hdc7
md: hdc1 has different UUID to hdc7
md: created md5
md: bind<hdc7>
md: running: <hdc7>
raid1: raid set md5 active with 1 out of 2 mirrors
md: considering hdc6 ...
md:  adding hdc6 ...
md: hdc5 has different UUID to hdc6
md: hdc3 has different UUID to hdc6
md: hdc1 has different UUID to hdc6
md: created md4
md: bind<hdc6>
md: running: <hdc6>
raid1: raid set md4 active with 1 out of 2 mirrors
md: considering hdc5 ...
md:  adding hdc5 ...
md: hdc3 has different UUID to hdc5
md: hdc1 has different UUID to hdc5
md: created md3
md: bind<hdc5>
md: running: <hdc5>
raid1: raid set md3 active with 1 out of 2 mirrors
md: considering hdc3 ...
md:  adding hdc3 ...
md: hdc1 has different UUID to hdc3
md: created md2
md: bind<hdc3>
md: running: <hdc3>
raid1: raid set md2 active with 1 out of 2 mirrors
md: considering hdc1 ...
md:  adding hdc1 ...
md: created md0
md: bind<hdc1>
md: running: <hdc1>
raid1: raid set md0 active with 1 out of 2 mirrors
md: ... autorun DONE.


# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hdc3[1]
      5863616 blocks [2/1] [_U]

md3 : active raid1 hdc5[1]
      2939776 blocks [2/1] [_U]

md4 : active raid1 hdc6[1]
      1959808 blocks [2/1] [_U]

md5 : active raid1 hdc7[1]
      144432256 blocks [2/1] [_U]

md0 : active raid1 hdc1[1]
      104320 blocks [2/1] [_U]
Back to top
View user's profile Send private message
nielchiano
Veteran
Veteran


Joined: 11 Nov 2003
Posts: 1287
Location: 50N 3E

PostPosted: Wed Jun 07, 2006 12:59 pm    Post subject: Re: Software RAID 1 broke down, how can I save data? Reply with quote

electroarts wrote:
I've been using a software mirroring RAID 1 for my server. One of the disks seems to have died, because all activity on the server from Mar 2 until today when we rebooted it (including all logs and history etc--and that was a LOT of activity!) has disappeared.

This doesn't make sense... in a RAID-1 you don't loose data because of a failed disk... That's the whole point in RAID-1!
If you did loose data, you were either (a) not using RAID, (b) had a very strange problem like switching "active" disks from 1 set to the other..

electroarts wrote:
I'd like to try rebuilding the data once I get a new drive in there, but I'm unsure where to start.

http://www.tldp.org/HOWTO/Software-RAID-HOWTO-6.html will get you started
electroarts wrote:
Add to that the fact that the server has been back online now, and so data has been written to the active disk in the mirror. Will this pretty much ruin my chances of recovering the lost data?

Again: this has nothing to do with RAID being in a degenerated state.
And yes, once you start OVERwriting data, you lose what was under it.
electroarts wrote:
Some questions:
How can I tell which drive is corrupted? I used mkraid and not mdadm to create the array, BTW.

See url above
electroarts wrote:
If I replace the bad drive, is it possible to recover the lost data? Even if new data has been written to the drive?

maybe... but again: I don't think this has anything to do with RAID.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2093
Location: San Jose, CA

PostPosted: Sat Jul 08, 2006 8:22 am    Post subject: Reply with quote

It funny, I've just suffered my second 160GB raid1 failure.

I came here to confirm that the "(F)" meant fail, I found a thread that taks about mdadm so I emerged mdadm and confirmed the failure.

here is the output of cat /proc/mdstat:

Code:
server ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid0 hdg1[1] hde1[0]
      390716672 blocks 32k chunks

md0 : active raid1 hdb2[2](F) hda2[0]
      155782208 blocks [2/1] [U_]


Notice it lists two drives for each raid. A raid 0 with hdg and hde and a raid1 with hda and hdb.

hdb is dead.

Here's the output of mdadm:

Code:
server etc # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.02
  Creation Time : Sun Jun  5 05:39:18 2005
     Raid Level : raid1
     Array Size : 155782208 (148.57 GiB 159.52 GB)
    Device Size : 155782208 (148.57 GiB 159.52 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Jul  8 01:24:43 2006
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : a39e33a0:9f5e993f:0998748d:fb147598
         Events : 0.22149265

    Number   Major   Minor   RaidDevice State
       0       3        2        0      active sync   /dev/hda2
       1       0        0        1      removed

       2       3       66        -      faulty spare   /dev/hdb2


Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.

I wonder if the Samsung (who's sister died last year) or the Seagate (the sister's replacement) died.

I'll find out tomorrow when I replace it with some other brand.

Here's the point of my post: from your /proc/mdstat it looks like you only had one drive for your raid1. So something's not right...

Can you post your /etc/raidtab?

Raydude
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
bludger
Guru
Guru


Joined: 09 Apr 2003
Posts: 389

PostPosted: Fri Jul 14, 2006 8:30 pm    Post subject: Reply with quote

RayDude wrote:

Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.

I have a similar problem. This only shows one partition as being faulty. Does this mean that the whole disk is faulty, or that it can somehow be repaired?
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2093
Location: San Jose, CA

PostPosted: Sun Jul 16, 2006 8:02 am    Post subject: Reply with quote

bludger wrote:
RayDude wrote:

Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.

I have a similar problem. This only shows one partition as being faulty. Does this mean that the whole disk is faulty, or that it can somehow be repaired?


Is this a Raid1?

A raid1 on two partitions of the same drive probably won't help keep the data safe. Its possible that data on one partition would go bad while the other is okay but more than likely the whole drive would die.

A raid1 should be on two disks of the same size or two partitions on two disks of the same size. And I'm under the belief that making it two different brands may be a good idea too, just in case there is a manufacturing defect with both drives that cause them to fail at approximatelyl the same time. Its probably more superstition than anything, failures are rare.

Raydude
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
bludger
Guru
Guru


Joined: 09 Apr 2003
Posts: 389

PostPosted: Mon Jul 17, 2006 8:50 am    Post subject: Reply with quote

RayDude wrote:
bludger wrote:
RayDude wrote:

Which confirms the dead drive. Looks like I hit Fry's tomorrow for yet another cheap drive.

I have a similar problem. This only shows one partition as being faulty. Does this mean that the whole disk is faulty, or that it can somehow be repaired?


Is this a Raid1?

A raid1 on two partitions of the same drive probably won't help keep the data safe. Its possible that data on one partition would go bad while the other is okay but more than likely the whole drive would die.

A raid1 should be on two disks of the same size or two partitions on two disks of the same size. And I'm under the belief that making it two different brands may be a good idea too, just in case there is a manufacturing defect with both drives that cause them to fail at approximatelyl the same time. Its probably more superstition than anything, failures are rare.

Raydude


Yes it is a RAID 1 and was set up between identical partitions on two disks of the same size. Apparently AFAIK you can't set up a RAID between raw disks with the current linux kernel yet, although I could be wrong there. It just seemed strange to me that only one partition was shown by mdadm as being faulty, although all 4 partition-pairs were set up.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum