View previous topic :: View next topic |
Author |
Message |
piavlo Tux's lil' helper
Joined: 21 Jun 2005 Posts: 141
|
Posted: Mon Jun 19, 2006 2:29 pm Post subject: how to force a remap of badblock |
|
|
Hi i've
Code: | # smartctl -l selftest /dev/hdh
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 10147 272182497
# 2 Extended offline Completed: read failure 20% 10133 272182497
# 3 Extended offline Completed: read failure 20% 10129 272182497 |
trying to force a remap with:
Code: | #dd if=/dev/zero of=/dev/hdh count=1 bs=512 seek=272182496
dd: writing `/dev/hdh': Input/output error
1+0 records in
0+0 records out |
Code: | # smartctl -A /dev/hdh
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 199 198 063 Pre-fail Always - 16841
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 376
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 1
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 252 245 187 Pre-fail Always - 50557
9 Power_On_Hours 0x0032 224 224 000 Old_age Always - 19017
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 8
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 042 253 000 Old_age Always - 38
195 Hardware_ECC_Recovered 0x000a 253 211 000 Old_age Always - 1844
196 Reallocated_Event_Count 0x0008 250 250 000 Old_age Offline - 3
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 1
198 Offline_Uncorrectable 0x0008 252 250 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0
202 TA_Increase_Count 0x000a 253 250 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 1
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 241 241 000 Old_age Offline - 148
210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
212 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0
|
Does this mean that i've to replace a disk or force a mke2fs skip this block?
Howether it seems like i cant write any block beyond 272182497 with dd either.
If so then why smartctl -H /dev/hdh says the health chack status is PASSED
Thanks |
|
Back to top |
|
|
Keruskerfuerst Advocate
Joined: 01 Feb 2006 Posts: 2289 Location: near Augsburg, Germany
|
Posted: Mon Jun 19, 2006 5:00 pm Post subject: |
|
|
Have a look at "man badblocks". |
|
Back to top |
|
|
piavlo Tux's lil' helper
Joined: 21 Jun 2005 Posts: 141
|
Posted: Mon Jun 19, 2006 7:35 pm Post subject: |
|
|
Keruskerfuerst wrote: | Have a look at "man badblocks". |
How does this help me with the problem? with smartctl i've much grainer control. |
|
Back to top |
|
|
troymc Guru
Joined: 22 Mar 2006 Posts: 553
|
Posted: Tue Jun 20, 2006 4:42 pm Post subject: |
|
|
badblocks is just the easy, but long, way to do it. It can do a non-destructive read-write test of the entire platter, thus detecting and triggering the re-mapping of any bad blocks. Plus if the drive cannot map out the blocks, then it can output a badblocks list, which can then be fed into mkfs, fsck, debugfs, etc.
I'm assuming you are dd'ing directly to the LBA block because you already know that this block is not part of a file? Or it is and you don't care if you lose the data? If not, then you might be interested in reading the Bad Block HowTo.
All that being said, I don't have a simple answer for you question. I would suggest trying to localize the error yourself before attempting to re-map it. Try something like this:
Code: |
# export i=272182491
# while [ $i -lt 272182500 ]
> do echo $i
> dd if=/dev/hdh of=/dev/null bs=512 count=1 skip=$i
> let i+=1
> done
|
This will attempt to read those 10 sectors. You should see successful reads on all but 272182497. If you don't, then something else is wrong. There is no reason that it should not allow you to read past that error like you stated.
Also, there are some issues with only writing a single LBA block since the OS is used to writing larger FS blocks. [yes, I know dd is supposedly bypassing the fs and writing to the device. try reading here: http://permalink.gmane.org/gmane.linux.utilities.smartmontools/3445] There are 2 solutions: 1) try sync'ing after the dd write to get the drive to flush it's buffers, or 2) write a full fs block (usually 4096, but I'd stat -f or debugfs it to find out the correct blocksize).
troymc |
|
Back to top |
|
|
piavlo Tux's lil' helper
Joined: 21 Jun 2005 Posts: 141
|
Posted: Thu Jun 22, 2006 12:42 pm Post subject: |
|
|
Thansk for your reply troymc.
Now i'm confused, i've run Code: | badblocks -v -w -t random /dev/hda |
twice the first time it told me that there are 204 bad blocks the second time it told that there are 124 bad blocks, but:
Code: | # smartctl -A /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 252 252 063 Pre-fail Always - 2548
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 5
5 Reallocated_Sector_Ct 0x0033 253 238 063 Pre-fail Always - 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 252 249 187 Pre-fail Always - 52799
9 Power_On_Minutes 0x0032 247 247 000 Old_age Always - 53h+06m
10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail Always - 2
11 Calibration_Retry_Count 0x002b 252 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 76
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 28
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 2605
196 Reallocated_Event_Count 0x0008 122 122 000 Old_age Offline - 131
197 Current_Pending_Sector 0x0008 253 238 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 001 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 249 000 Old_age Always - 5
202 TA_Increase_Count 0x000a 253 001 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 251 180 Pre-fail Always - 0
204 Shock_Count_Write_Opern 0x000a 253 246 000 Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 252 252 000 Old_age Always - 2
208 Spin_Buzz 0x002a 252 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 204 200 000 Old_age Offline - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
|
shows the same output and says there are no bad sectors. Also the selftest's pass
Code: | # smartctl -l selftest /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 2086 -
# 2 Short offline Completed without error 00% 2084 -
# 3 Short offline Completed without error 00% 2078 -
# 4 Short offline Completed: read failure 60% 2055 313398962
# 5 Short offline Completed: read failure 60% 2037 303979596
# 6 Short offline Completed: read failure 60% 2032 297461065
# 7 Short offline Completed: read failure 60% 2032 289527972
# 8 Short offline Completed: read failure 60% 2032 289527972
# 9 Short offline Completed: read failure 60% 2032 290107771
#10 Short offline Completed: read failure 60% 2032 289403790
#11 Extended offline Completed: read failure 20% 2015 288323020
#12 Short offline Completed: read failure 60% 2009 346875705
#13 Short offline Completed: read failure 30% 1960 317035555
|
Now i don't know that to conclude, why the extended test is ok but badblock command says that there are bad sectors and each time find different amount of bad sectors. But the output of smartctl -A /dev/hda does not change? |
|
Back to top |
|
|
troymc Guru
Joined: 22 Mar 2006 Posts: 553
|
Posted: Thu Jun 22, 2006 4:34 pm Post subject: |
|
|
Ok, this is a different drive, right?
What catches my eye here, is that it has attempted to reallocate sectors 131 times, but nothing has been reallocated, and nothing is pending.
Code: |
...
5 Reallocated_Sector_Ct 0x0033 253 238 063 Pre-fail Always - 0
...
196 Reallocated_Event_Count 0x0008 122 122 000 Old_age Offline - 131
197 Current_Pending_Sector 0x0008 253 238 000 Old_age Offline - 0
...
|
Over-heating maybe? or flakey cable?
troymc |
|
Back to top |
|
|
piavlo Tux's lil' helper
Joined: 21 Jun 2005 Posts: 141
|
Posted: Fri Jun 23, 2006 10:19 am Post subject: |
|
|
troymc wrote: | Ok, this is a different drive, right? |
Yes, i've mamaged to fix errors on the /dev/hdh drive with
Code: | badblocks -v -w -t 0 /dev/hdh | then rechecked with
Code: | badblocks -v -w -t random /dev/hdh | and all seems fine.
troymc wrote: | What catches my eye here, is that it has attempted to reallocate sectors 131 times, but nothing has been reallocated, and nothing is pending.
Code: |
...
5 Reallocated_Sector_Ct 0x0033 253 238 063 Pre-fail Always - 0
...
196 Reallocated_Event_Count 0x0008 122 122 000 Old_age Offline - 131
197 Current_Pending_Sector 0x0008 253 238 000 Old_age Offline - 0
...
|
Over-heating maybe? or flakey cable?
|
With /dev/hda before i run Code: | badblocks -v -w -t 0 /dev/hda |
i had Reallocated_Sector_Ct=131 and Current_Pending_Sector=131
the badblocks -v -w -t 0 /dev/hda returned with no badlocks
and the i've got Code: |
...
5 Reallocated_Sector_Ct 0x0033 253 238 063 Pre-fail Always - 0
...
196 Reallocated_Event_Count 0x0008 122 122 000 Old_age Offline - 131
197 Current_Pending_Sector 0x0008 253 238 000 Old_age Offline - 0
...
| and Code: | badblocks -v -w -t random /dev/hda | returns different badlock number at each run. Does not seem like there is overheat problem now, i'll try to replace a cable
then i get back to work. |
|
Back to top |
|
|
|