View previous topic :: View next topic |
Author |
Message |
Fog_Watch Apprentice


Joined: 24 Jul 2006 Posts: 271 Location: Utility Muffin Research Kitchen
|
Posted: Thu Nov 15, 2012 12:39 am Post subject: Has sector 88 of both Raid 1 disks failed? [MOSTLY SOLVED] |
|
|
Hello,
My Proliant DL380 G5 boots off an md raid 1 of two Samsung HD204UI 2TB disks. The disks are attached via eSATA cables to a PCI-X Silicon Image 3124 controller.
A couple of times /dev/md1 does not come up clean and on one occasion the physical volume that sits on /dev/md1 had disappeared and needed re-creating. The following relates to my attempts to resolve these issues.
dmesg does not look nice:
Quote: | ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: irq_stat 0x00020002, device error via D2H FIS
ata2.00: failed command: READ DMA
ata2.00: cmd c8/00:40:20:00:00/00:00:00:00:00/e0 tag 0 dma 32768 in
res 51/40:08:58:00:00/00:00:00:00:00/e0 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/100
sd 1:0:0:0: [sdb] Unhandled sense code
sd 1:0:0:0: [sdb]
Result: hostbyte=0x00 driverbyte=0x08
sd 1:0:0:0: [sdb]
Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
00 00 00 58
sd 1:0:0:0: [sdb]
ASC=0x11 ASCQ=0x4
sd 1:0:0:0: [sdb] CDB:
cdb[0]=0x28: 28 00 00 00 00 20 00 00 40 00
end_request: I/O error, dev sdb, sector 88
Buffer I/O error on device sdb, logical block 11
ata2: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x00020002, device error via D2H FIS
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:40:20:00:00/00:00:00:00:00/e0 tag 0 dma 32768 in
res 51/40:08:58:00:00/00:00:00:00:00/e0 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/100
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]
Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]
Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
00 00 00 58
sd 0:0:0:0: [sda]
ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB:
cdb[0]=0x28: 28 00 00 00 00 20 00 00 40 00
end_request: I/O error, dev sda, sector 88
Buffer I/O error on device sda, logical block 11
ata1: EH complete
|
Smartctl concurs, smartctl -a /dev/sda
Quote: | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 4207 88
| and, smartctl -a /dev/sdb
Quote: | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 1819 88
|
It does not appear that this problem is being caused by any funky security features, hdparm -I /dev/sda | tail -n 14
Quote: | Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
338min for SECURITY ERASE UNIT. 338min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50000f00100b0031
NAA : 5
IEEE OUI : 0000f0
Unique ID : 0100b0031
Integrity word not set (found 0x0000, expected 0x17a5) |
Does this really mean that sector 88 of both disks has failed? If so, what do you think the cause of this might be?
Last edited by Fog_Watch on Tue Dec 04, 2012 2:48 am; edited 1 time in total |
|
Back to top |
|
 |
Ant P. Watchman

Joined: 18 Apr 2009 Posts: 6920
|
Posted: Thu Nov 15, 2012 2:23 am Post subject: |
|
|
It's not outside the realm of possibility... if they're both similar disks and they've always been used as mirrors, they'll have mostly identical wear patterns. If they've ever been through a power failure it's possible the heads had an accident at the same place too. Could also both be from a manufacturing batch with a recurring defect. |
|
Back to top |
|
 |
Fog_Watch Apprentice


Joined: 24 Jul 2006 Posts: 271 Location: Utility Muffin Research Kitchen
|
Posted: Thu Nov 15, 2012 2:41 am Post subject: |
|
|
gdisk -l /dev/sda | tail -n 5
Quote: | Number Start (sector) End (sector) Size Code Name
1 2048 6143 2.0 MiB EF02 BIOS boot
2 6144 10491903 5.0 GiB 8200 Swap
3 10491904 10553343 30.0 MiB FD00 md0
4 10553344 3907029134 1.8 TiB FD00 md1 |
The above suggests to me that sector 88 is unused - so "end_request: I/O error, dev sda, sector 88" is nothing to worry about? |
|
Back to top |
|
 |
Fog_Watch Apprentice


Joined: 24 Jul 2006 Posts: 271 Location: Utility Muffin Research Kitchen
|
Posted: Tue Dec 04, 2012 2:47 am Post subject: |
|
|
This is what I then did:
Panic,
Get two new drives and get those old ones with the dodgy sector 88 off my server,
Think,
Use Samsung's estools. This confirmed the problem with sector 88. I then used estools to do a 12 hour low level format. estools then no-longer reported a problem with sector 88.
smartctl concurs: Before; after.
What caused the original sector 88 problem who only knows. And, why was Reallocated_Event_Count and Current_Pending_Sector = 0. I would have thought that if there was a problem there would have been some reallocations going on.
Anyway, estools reformat seems to have done the trick. |
|
Back to top |
|
 |
salahx Guru

Joined: 12 Mar 2005 Posts: 559
|
Posted: Tue Dec 04, 2012 4:57 am Post subject: |
|
|
Reallocation only occur on writes. To force a drive to reallocate a bad sector, use "hdparm --write-sector" option on the problem sector - with great caution, (the option is described as VERY DANGEROUS in the manpage for a reason, as any data in the sector will be lost). |
|
Back to top |
|
 |
|