Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
how to force a remap of badblock
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
piavlo
Tux's lil' helper
Tux's lil' helper


Joined: 21 Jun 2005
Posts: 141

PostPosted: Mon Jun 19, 2006 2:29 pm    Post subject: how to force a remap of badblock Reply with quote

Hi i've
Code:
# smartctl -l selftest /dev/hdh
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       60%     10147         272182497
# 2  Extended offline    Completed: read failure       20%     10133         272182497
# 3  Extended offline    Completed: read failure       20%     10129         272182497

trying to force a remap with:
Code:
#dd if=/dev/zero of=/dev/hdh count=1 bs=512 seek=272182496
dd: writing `/dev/hdh': Input/output error
1+0 records in
0+0 records out


Code:
# smartctl -A /dev/hdh
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   199   198   063    Pre-fail  Always       -       16841
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       376
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       1
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   252   245   187    Pre-fail  Always       -       50557
  9 Power_On_Hours          0x0032   224   224   000    Old_age   Always       -       19017
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       8
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   042   253   000    Old_age   Always       -       38
195 Hardware_ECC_Recovered  0x000a   253   211   000    Old_age   Always       -       1844
196 Reallocated_Event_Count 0x0008   250   250   000    Old_age   Offline      -       3
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       1
198 Offline_Uncorrectable   0x0008   252   250   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0
202 TA_Increase_Count       0x000a   253   250   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       1
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   241   241   000    Old_age   Offline      -       148
210 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
211 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
212 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       0


Does this mean that i've to replace a disk or force a mke2fs skip this block?
Howether it seems like i cant write any block beyond 272182497 with dd either.
If so then why smartctl -H /dev/hdh says the health chack status is PASSED

Thanks
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Mon Jun 19, 2006 5:00 pm    Post subject: Reply with quote

Have a look at "man badblocks".
Back to top
View user's profile Send private message
piavlo
Tux's lil' helper
Tux's lil' helper


Joined: 21 Jun 2005
Posts: 141

PostPosted: Mon Jun 19, 2006 7:35 pm    Post subject: Reply with quote

Keruskerfuerst wrote:
Have a look at "man badblocks".

How does this help me with the problem? with smartctl i've much grainer control.
Back to top
View user's profile Send private message
troymc
Guru
Guru


Joined: 22 Mar 2006
Posts: 553

PostPosted: Tue Jun 20, 2006 4:42 pm    Post subject: Reply with quote

badblocks is just the easy, but long, way to do it. It can do a non-destructive read-write test of the entire platter, thus detecting and triggering the re-mapping of any bad blocks. Plus if the drive cannot map out the blocks, then it can output a badblocks list, which can then be fed into mkfs, fsck, debugfs, etc.

I'm assuming you are dd'ing directly to the LBA block because you already know that this block is not part of a file? Or it is and you don't care if you lose the data? If not, then you might be interested in reading the Bad Block HowTo.

All that being said, I don't have a simple answer for you question. I would suggest trying to localize the error yourself before attempting to re-map it. Try something like this:
Code:

# export i=272182491
# while [ $i -lt 272182500 ]
        > do echo $i
        > dd if=/dev/hdh of=/dev/null bs=512 count=1 skip=$i
        > let i+=1
        > done


This will attempt to read those 10 sectors. You should see successful reads on all but 272182497. If you don't, then something else is wrong. There is no reason that it should not allow you to read past that error like you stated.

Also, there are some issues with only writing a single LBA block since the OS is used to writing larger FS blocks. [yes, I know dd is supposedly bypassing the fs and writing to the device. try reading here: http://permalink.gmane.org/gmane.linux.utilities.smartmontools/3445] There are 2 solutions: 1) try sync'ing after the dd write to get the drive to flush it's buffers, or 2) write a full fs block (usually 4096, but I'd stat -f or debugfs it to find out the correct blocksize).


troymc
Back to top
View user's profile Send private message
piavlo
Tux's lil' helper
Tux's lil' helper


Joined: 21 Jun 2005
Posts: 141

PostPosted: Thu Jun 22, 2006 12:42 pm    Post subject: Reply with quote

Thansk for your reply troymc.

Now i'm confused, i've run
Code:
badblocks -v -w -t random /dev/hda

twice the first time it told me that there are 204 bad blocks the second time it told that there are 124 bad blocks, but:
Code:
 #  smartctl -A /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   252   252   063    Pre-fail  Always       -       2548
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       5
  5 Reallocated_Sector_Ct   0x0033   253   238   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   252   249   187    Pre-fail  Always       -       52799
  9 Power_On_Minutes        0x0032   247   247   000    Old_age   Always       -       53h+06m
 10 Spin_Retry_Count        0x002b   252   252   157    Pre-fail  Always       -       2
 11 Calibration_Retry_Count 0x002b   252   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       76
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       28
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       2605
196 Reallocated_Event_Count 0x0008   122   122   000    Old_age   Offline      -       131
197 Current_Pending_Sector  0x0008   253   238   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   001   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   249   000    Old_age   Always       -       5
202 TA_Increase_Count       0x000a   253   001   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   251   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   246   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   252   252   000    Old_age   Always       -       2
208 Spin_Buzz               0x002a   252   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   204   200   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

shows the same output and says there are no bad sectors. Also the selftest's pass
Code:
# smartctl -l selftest /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2086         -
# 2  Short offline       Completed without error       00%      2084         -
# 3  Short offline       Completed without error       00%      2078         -
# 4  Short offline       Completed: read failure       60%      2055         313398962
# 5  Short offline       Completed: read failure       60%      2037         303979596
# 6  Short offline       Completed: read failure       60%      2032         297461065
# 7  Short offline       Completed: read failure       60%      2032         289527972
# 8  Short offline       Completed: read failure       60%      2032         289527972
# 9  Short offline       Completed: read failure       60%      2032         290107771
#10  Short offline       Completed: read failure       60%      2032         289403790
#11  Extended offline    Completed: read failure       20%      2015         288323020
#12  Short offline       Completed: read failure       60%      2009         346875705
#13  Short offline       Completed: read failure       30%      1960         317035555

Now i don't know that to conclude, why the extended test is ok but badblock command says that there are bad sectors and each time find different amount of bad sectors. But the output of smartctl -A /dev/hda does not change?
Back to top
View user's profile Send private message
troymc
Guru
Guru


Joined: 22 Mar 2006
Posts: 553

PostPosted: Thu Jun 22, 2006 4:34 pm    Post subject: Reply with quote

Ok, this is a different drive, right?

What catches my eye here, is that it has attempted to reallocate sectors 131 times, but nothing has been reallocated, and nothing is pending.
Code:

...
  5 Reallocated_Sector_Ct   0x0033   253   238   063    Pre-fail  Always       -       0
...
196 Reallocated_Event_Count 0x0008   122   122   000    Old_age   Offline      -       131
197 Current_Pending_Sector  0x0008   253   238   000    Old_age   Offline      -       0
...


Over-heating maybe? or flakey cable?


troymc
Back to top
View user's profile Send private message
piavlo
Tux's lil' helper
Tux's lil' helper


Joined: 21 Jun 2005
Posts: 141

PostPosted: Fri Jun 23, 2006 10:19 am    Post subject: Reply with quote

troymc wrote:
Ok, this is a different drive, right?

Yes, i've mamaged to fix errors on the /dev/hdh drive with
Code:
badblocks -v -w -t 0 /dev/hdh
then rechecked with
Code:
badblocks -v -w -t random /dev/hdh
and all seems fine.

troymc wrote:
What catches my eye here, is that it has attempted to reallocate sectors 131 times, but nothing has been reallocated, and nothing is pending.
Code:

...
  5 Reallocated_Sector_Ct   0x0033   253   238   063    Pre-fail  Always       -       0
...
196 Reallocated_Event_Count 0x0008   122   122   000    Old_age   Offline      -       131
197 Current_Pending_Sector  0x0008   253   238   000    Old_age   Offline      -       0
...


Over-heating maybe? or flakey cable?


With /dev/hda before i run
Code:
badblocks -v -w -t 0 /dev/hda

i had Reallocated_Sector_Ct=131 and Current_Pending_Sector=131
the badblocks -v -w -t 0 /dev/hda returned with no badlocks
and the i've got
Code:

...
  5 Reallocated_Sector_Ct   0x0033   253   238   063    Pre-fail  Always       -       0
...
196 Reallocated_Event_Count 0x0008   122   122   000    Old_age   Offline      -       131
197 Current_Pending_Sector  0x0008   253   238   000    Old_age   Offline      -       0
...
and
Code:
badblocks -v -w -t random /dev/hda
returns different badlock number at each run. Does not seem like there is overheat problem now, i'll try to replace a cable
then i get back to work.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum