Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Faulty disk in raid, can badblocks make it usable?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
zotalore
n00b
n00b


Joined: 22 Jan 2008
Posts: 54

PostPosted: Sun Mar 29, 2009 8:02 am    Post subject: Faulty disk in raid, can badblocks make it usable? Reply with quote

I recently experienced my first software raid (raid5) drive failure. Is there anyway to mark the bad blocks and re-add the drive to the raid?

Here's some more details. When I boot my system md reports the following:

Code:

md: Autodetecting RAID arrays.
md: Scanned 4 and added 4 devices.
md: autorun ...
md: considering hdd2 ...
md:  adding hdd2 ...
md:  adding hdc2 ...
md:  adding hdb2 ...
md:  adding hda2 ...
md: created md0
md: bind<hda2>
md: bind<hdb2>
md: bind<hdc2>
md: bind<hdd2>
md: running: <hdd2><hdc2><hdb2><hda2>
md: kicking non-fresh hdc2 from array!
md: unbind<hdc2>
md: export_rdev(hdc2)
raid5: device hdd2 operational as raid disk 3
raid5: device hdb2 operational as raid disk 2
raid5: device hda2 operational as raid disk 0
raid5: allocated 4274kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
 --- rd:4 wd:3
 disk 0, o:1, dev:hda2
 disk 2, o:1, dev:hdb2
 disk 3, o:1, dev:hdd2
md0: bitmap initialized from disk: read 11/11 pages, set 22589 bits
created bitmap (174 pages) for device md0
md: ... autorun DONE.


So it appears that hdc is faulty, even though mdstat does not label the drive with 'F', or do I have to mark it faulty in order to do that?

Code:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdd2[3] hdb2[2] hda2[0]
      2185980288 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
      bitmap: 164/174 pages [656KB], 2048KB chunk

unused devices: <none>


mdadm reports my array as degraded with one drive removed:

Code:

 mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Feb 23 20:22:25 2008
     Raid Level : raid5
     Array Size : 2185980288 (2084.71 GiB 2238.44 GB)
  Used Dev Size : 728660096 (694.90 GiB 746.15 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Mar 29 09:51:54 2009
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : bb305d69:b11c41b3:725dd80f:10527e77
         Events : 0.1340266

    Number   Major   Minor   RaidDevice State
       0       3        2        0      active sync   /dev/hda2
       1       0        0        1      removed
       2       3       66        2      active sync   /dev/hdb2
       3      22       66        3      active sync   /dev/hdd2

If I run badblocks (I had it running for 24 hours, but it seem to require 2-3 days until it will complete) and got the following messages in my kernel log

Code:

hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown



The smarctl selftest detects no errors:
Code:

 # smartctl -l selftest /dev/hdc
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7574         -


So is there any hope for this drive, or should I just replace it?

I also have an issue with my current kernel (I guess it's the old IDE driver) since the drives appears as hd and not sd. But I would like to get my full array back before I start building a new kernel.
Back to top
View user's profile Send private message
zotalore
n00b
n00b


Joined: 22 Jan 2008
Posts: 54

PostPosted: Sun Mar 29, 2009 8:20 am    Post subject: Reply with quote

I forgot to add the examine output for the faulty drive, which does not indicate any faults:

Code:

# mdadm --examine /dev/hdc2
/dev/hdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : bb305d69:b11c41b3:725dd80f:10527e77
  Creation Time : Sat Feb 23 20:22:25 2008
     Raid Level : raid5
  Used Dev Size : 728660096 (694.90 GiB 746.15 GB)
     Array Size : 2185980288 (2084.71 GiB 2238.44 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Thu Jan  1 03:24:13 2009
          State : clean
Internal Bitmap : present
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 54b45b6b - correct
         Events : 14

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1      22        2        1      active sync   /dev/hdc2

   0     0       3        2        0      active sync   /dev/hda2
   1     1      22        2        1      active sync   /dev/hdc2
   2     2       3       66        2      active sync   /dev/hdb2
   3     3      22       66        3      active sync   /dev/hdd2
Back to top
View user's profile Send private message
Dairinin
n00b
n00b


Joined: 03 Feb 2008
Posts: 64
Location: MSK, RF

PostPosted: Sun Mar 29, 2009 4:11 pm    Post subject: Reply with quote

Did you try readding your drive into array?
Code:
mdadm md0 --manage --add /dev/hdc2


BTW, CRC errors in dmesg are generaly because of a faulty cable or because of 40-wire IDE cable being used for >udma2 transfers.
Bad sectors in modern drives are remapped by drive itself. If you see sectors which you can not read/write, the drive is almost dead as it has already used all available sectors in a reallocation zone.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10732
Location: Somewhere over Atlanta, Georgia

PostPosted: Sun Mar 29, 2009 4:25 pm    Post subject: Reply with quote

I concur with Dairinin. The other symptom you reported is seek failures, indicating either that the servo platter is damaged or else there's some sort of electromechanical failure looming. Leaving that drive in the array is like playing with fire: there's a good chance you'll get burned.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
zotalore
n00b
n00b


Joined: 22 Jan 2008
Posts: 54

PostPosted: Sun Mar 29, 2009 10:19 pm    Post subject: Reply with quote

I've replaced the drive today. It seems like it will take several days for the reconstruction to complete. I'll inspect the drive on a different machine using tools supplied by the vendor.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum