View previous topic :: View next topic |
Author |
Message |
menschmeier l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/2082399034213b9a2a5148.gif)
Joined: 15 Dec 2004 Posts: 727
|
Posted: Wed Jul 21, 2010 7:57 am Post subject: disk problem - controller or disk defect? |
|
|
A few weeks ago my backup disk died, now the internal disk makes some trouble.
I rad out the SMART informations but I can not say it the disk is defect or something else. Can someone help interpreting the data:
Quote: | # smartctl --all /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Fujitsu MHV series
Device Model: FUJITSU MHV2120BH
Serial Number: NW60T6C26HPH
Firmware Version: 00000029
User Capacity: 120,034,123,776 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a
Local Time is: Wed Jul 21 09:51:35 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 702) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 82) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 27958
2 Throughput_Performance 0x0005 100 100 030 Pre-fail Offline - 31916032
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 10805
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always - 8589934592000
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 640
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline - 0
9 Power_On_Seconds 0x0032 089 089 000 Old_age Always - 5902h+44m+26s
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4938
192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 396
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 92975
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 39 (Lifetime Min/Max 9/49)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 530
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 395837440
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000f 100 100 060 Pre-fail Always - 27846
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 2628525819015
240 Head_Flying_Hours 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 273 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 273 occurred at disk power-on lifetime: 5902 hours (245 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 91 bb 59 e1 Error: UNC 16 sectors at LBA = 0x0159bb91 = 22657937
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 10 91 bb 59 e1 00 00:45:18.377 READ DMA
27 00 00 00 00 00 e0 00 00:45:18.376 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:45:18.370 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:45:18.363 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:45:18.363 READ NATIVE MAX ADDRESS EXT
Error 272 occurred at disk power-on lifetime: 5902 hours (245 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 91 bb 59 e1 Error: UNC 16 sectors at LBA = 0x0159bb91 = 22657937
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 10 91 bb 59 e1 00 00:45:13.942 READ DMA
27 00 00 00 00 00 e0 00 00:45:13.942 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:45:13.935 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:45:13.930 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:45:13.930 READ NATIVE MAX ADDRESS EXT
Error 271 occurred at disk power-on lifetime: 5902 hours (245 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 91 bb 59 e1 Error: UNC 16 sectors at LBA = 0x0159bb91 = 22657937
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 10 91 bb 59 e1 00 00:45:09.509 READ DMA
27 00 00 00 00 00 e0 00 00:45:09.509 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:45:09.502 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:45:09.496 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:45:09.495 READ NATIVE MAX ADDRESS EXT
Error 270 occurred at disk power-on lifetime: 5902 hours (245 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 91 bb 59 e1 Error: UNC 16 sectors at LBA = 0x0159bb91 = 22657937
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 10 91 bb 59 e1 00 00:45:04.963 READ DMA
27 00 00 00 00 00 e0 00 00:45:04.962 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:45:04.955 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:45:04.950 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:45:04.949 READ NATIVE MAX ADDRESS EXT
Error 269 occurred at disk power-on lifetime: 5902 hours (245 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 91 bb 59 e1 Error: UNC 16 sectors at LBA = 0x0159bb91 = 22657937
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 10 91 bb 59 e1 00 00:45:00.533 READ DMA
27 00 00 00 00 00 e0 00 00:45:00.532 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:45:00.525 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 00:45:00.519 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:45:00.519 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5813 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay. |
Thanks for your support. _________________ Please notice the back of this message. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
eccerr0r Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
Joined: 01 Jul 2004 Posts: 9891 Location: almost Mile High in the USA
|
Posted: Wed Jul 21, 2010 1:31 pm Post subject: |
|
|
To me it looks like an UNCorrectable 16 sector read at LBA 22657937 according to the error log. Probably a bad sector showed up. Might want to check your power supply/cables too, as they can cause weird errors to show up. Oh laptops? too bad, can't do hardware swaps to narrow down issues.
Your SMART data looks like it's still healthy despite the scary numbers (due to byte ordering most likely) but there probably was at least one sector remapped already post-manufacturer... but that's just a guess. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
menschmeier l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/2082399034213b9a2a5148.gif)
Joined: 15 Dec 2004 Posts: 727
|
Posted: Wed Jul 21, 2010 3:53 pm Post subject: |
|
|
Ok, a bad sector. I am using XFS can this be fixed by software somehow? I mean is it possible to deactivated this/these sectors? _________________ Please notice the back of this message. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
DirtyHairy l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/gallery/Monkey Island/Monkey_Island_-_Murray.gif)
Joined: 03 Jul 2006 Posts: 608 Location: Würzburg, Deutschland
|
Posted: Thu Jul 22, 2010 10:31 am Post subject: |
|
|
Depending to the value you attribute to your data, it might be a wise choice to get a new disk, clone the contents of the old one and swap it. Your harddrive might still have years of happy scrappy life ahead, but the error you are observing might also be a messenger of more serious trouble in the near future, and once you have found out, it will be too late. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
madchaz l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/1845060408457ccb7bc158c.jpg)
Joined: 01 Jul 2003 Posts: 995 Location: Quebec, Canada
|
Posted: Thu Jul 22, 2010 6:44 pm Post subject: |
|
|
xfs tools should have a fix utility that can mark the bad sectors as such.
However, bad sectors do have a tendency to spread sometimes, so unless this is an "old" issue, I would suggest getting a new drive and moving your data to it. If you want to keep the disk, give it a try for a while after and see what happens. I wouldn't trust it anymore _________________ Someone asked me once if I suffered from mental illness. I told him I enjoyed every second of it. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
thegeezer n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 11 Jul 2010 Posts: 38
|
Posted: Thu Jul 22, 2010 10:23 pm Post subject: |
|
|
for those that are interested i gleaned this from another forum
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
if this is anything other than zero it is trying to auto remap the bad sector.
hard drives will/should try to remap sectors that they can't read... of course reading garbage is still 'reading' so it is a bit subjective. could always load up GRC spinrite to kill i mean detect and repair the disk. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
eccerr0r Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
Joined: 01 Jul 2004 Posts: 9891 Location: almost Mile High in the USA
|
Posted: Thu Jul 22, 2010 11:13 pm Post subject: |
|
|
Yes, this would match up with the sector it's having problems reading.
Hard drives can't actually remap sectors they can't read (maybe sectors it could read with difficulty...then okay...) this would imply it would silently corrupt your data. I'd rather have an error than have some bits suddenly disappear without warning.
Here's another idea: Most hard drives will remap on *writing* sectors. When a sector is rewritten, it means the data on the sector is trash anyway so it can use a fresh sector. Of course if the bad sector ended up in a spot where shared data is stored (like metadata) and read-modify-writes are needed, that sector now destroyed and quite possibly multiple files' data is gone.
But if willing to take that risk, you can use hdparm to force a sector write - and force the hard drive to remap. There's a reason why the hdparm man pages say "EXTREMELY DANGEROUS" -- because it is!
You could
Code: | hdparm --write-sector 22657937 /dev/sda |
THIS IS VERY DANGEROUS!!! Treat this as if you were to badblocks -w on the disk, because badblocks -w will also induce sector remapping for that bad sector too.
Backup your drive before you try this, because you don't know what data sector you're about to randomly overwrite (unless you figured out what file/directory/... this sector belongs to...) _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
menschmeier l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/2082399034213b9a2a5148.gif)
Joined: 15 Dec 2004 Posts: 727
|
Posted: Fri Jul 23, 2010 9:59 pm Post subject: |
|
|
OK, I bought a new disk yesterday - a seagate, I hope this one will work longer the the fujitsu I had ...
Now I am doing a new installation of the system. I do it manually, so because I have the world file I can easily do an emerge -e world ... Now it is almost done .... _________________ Please notice the back of this message. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|