jotake Tux's lil' helper
Joined: 23 Jan 2005 Posts: 102
Posted: Sun Nov 14, 2010 9:41 pm Post subject: [HARDWARE] analyse logs smartctl |
En lisant un topic récent sur un disque dur neuf apparemment plus ou moins mort, je me suis dis qu'il serait bien que j'aille aussi faire parler les miens...
Mais ne parlant pas courrament le langage de smartctl je m'en remet à vous pour me dire ce que vous pensez de l'état de santé de deux de mes HDD.
Le 1er (un vieux disque IDE de 80 Giga)
Code: |
serveur ~ # smartctl -a /dev/hda
smartctl 5.39.1 2010-01-28 r3054 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen,
Model Family: Seagate U7 family
Device Model: ST380022A
Serial Number: 3KB0PASG
Firmware Version: 3.30
User Capacity: 80,026,361,856 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Mon Nov 15 00:20:43 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 426) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 64) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 063 055 006 Pre-fail Always - 135275900
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1023
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 482036684
9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 25091
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 097 097 020 Old_age Always - 3963
194 Temperature_Celsius 0x0022 048 058 000 Old_age Always - 48
195 Hardware_ECC_Recovered 0x001a 063 055 000 Old_age Always - 135275900
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 195 000 Old_age Always - 12
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 15 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 15 occurred at disk power-on lifetime: 22495 hours (937 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
00 50 05 0c 4d e5 a0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
a1 00 05 0c 4d e5 a0 00 01:07:23.412 IDENTIFY PACKET DEVICE
ca 00 00 88 4c e5 e0 00 01:07:12.939 WRITE DMA
c8 00 08 98 89 6c e0 00 01:07:12.930 READ DMA
ca 00 08 30 4c e5 e0 00 01:07:12.929 WRITE DMA
c8 00 10 38 ec 1a e1 00 01:07:12.914 READ DMA
Error 14 occurred at disk power-on lifetime: 21557 hours (898 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 00 d0 2a e4 e0 Error: ICRC, ABRT at LBA = 0x00e42ad0 = 14953168
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 00 d0 2a e4 e0 00 00:01:33.685 READ DMA
c8 00 b0 20 2a e4 e0 00 00:01:33.683 READ DMA
c8 00 50 c8 29 e4 e0 00 00:01:33.666 READ DMA
c8 00 50 c8 29 e4 e0 00 00:01:33.233 READ DMA
c8 00 00 c8 28 e4 e0 00 00:01:33.222 READ DMA
Error 13 occurred at disk power-on lifetime: 21557 hours (898 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 01 17 2a e4 e0 Error: ICRC, ABRT 1 sectors at LBA = 0x00e42a17 = 14952983
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 50 c8 29 e4 e0 00 00:01:33.233 READ DMA
c8 00 00 c8 28 e4 e0 00 00:01:33.222 READ DMA
c8 00 10 90 c8 e5 e0 00 00:01:33.222 READ DMA
c8 00 10 78 c8 e5 e0 00 00:01:33.216 READ DMA
c8 00 08 38 c6 e5 e0 00 00:01:33.213 READ DMA
Error 12 occurred at disk power-on lifetime: 21557 hours (898 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 01 07 4d cf e0 Error: ICRC, ABRT 1 sectors at LBA = 0x00cf4d07 = 13585671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 4d cf e0 00 00:01:29.559 READ DMA
c8 00 10 e0 4c cf e0 00 00:01:29.554 READ DMA
c8 00 08 b0 4b ca e0 00 00:01:29.547 READ DMA
c8 00 08 30 4c ce e0 00 00:01:29.538 READ DMA
c8 00 10 c8 b6 cd e0 00 00:01:29.537 READ DMA
Error 11 occurred at disk power-on lifetime: 21557 hours (898 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 01 24 1c 00 a2 Error: ICRC, ABRT 1 sectors at LBA = 0x02001c24 = 33561636
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 3f e5 1c 00 a1 00 00:01:17.428 READ DMA
c8 00 3f 40 00 00 a0 00 00:01:17.424 READ DMA
c8 00 3f 3a 00 00 a2 00 00:01:17.419 READ DMA
c8 00 3f 40 00 00 a0 00 00:01:17.418 READ DMA
c8 00 3f 01 00 00 a0 00 00:01:17.413 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 22970 -
# 2 Short offline Completed without error 00% 18799 -
# 3 Short offline Completed without error 00% 18667 -
# 4 Short offline Completed without error 00% 14398 -
# 5 Short offline Completed without error 00% 13145 -
# 6 Short offline Completed without error 00% 7728 -
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Le second, un plus récent, toujours en IDE
Code: | serveur ~ # smartctl -a /dev/hdc
smartctl 5.39.1 2010-01-28 r3054 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen,
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model: ST3200822A
Serial Number: 4LJ2D7EV
Firmware Version: 3.01
User Capacity: 200,049,647,616 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Mon Nov 15 00:23:40 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 111) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 051 046 006 Pre-fail Always - 177437515
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1400
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 4
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 275811872
9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7651
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1621
194 Temperature_Celsius 0x0022 038 049 000 Old_age Always - 38
195 Hardware_ECC_Recovered 0x001a 051 046 000 Old_age Always - 177437515
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5103 -
# 2 Short offline Completed without error 00% 5103 -
# 3 Short offline Completed without error 00% 4710 -
# 4 Short offline Completed without error 00% 1156 -
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
guilc Bodhisattva
Joined: 15 Nov 2003 Posts: 3326 Location: Paris - France
jotake Tux's lil' helper
Joined: 23 Jan 2005 Posts: 102
Posted: Sun Nov 14, 2010 10:25 pm Post subject: |
Tous d'abord merci pour le lien, je vais enfin pouvoir parler le smartctl presque couramment !
J'avoue qu'effectivement le nombre d'erreur de lecture est assez impressionnant; Je vais essayer de changer la nappe IDE si j'arrive a en trouver une dans mon bordel !
Sinon, mise à part ceci, voit tu autres choses qui pourrait m'indiquer une fin de vie proche de ses HDD ? sachant qu'ils sont relativement ancien (plus de 8 ans pour le 80 giga).
Quels tests pourrais-je réaliser pour tester leurs états et estimer leurs durées de vies ? |
guilc Bodhisattva
Joined: 15 Nov 2003 Posts: 3326 Location: Paris - France
Posted: Mon Nov 15, 2010 8:51 am Post subject: |
Bah y a pas grand chose :
- de temps en temps un "smartctl -t long /dev/sdX" pour lancer un test de surface, suivi d'un "smartctl -l selftest /dev/sdX" pour lire le résultat quelques minutes/heures après
- régulièrement un petit check des valeurs SMART
La ton disque plus récent a réalloué 4 secteurs. Faut surveiller, tant que ça bouge pas ça devrait être bon (je ne sais pas depuis quand ils sont là). Par contre, si ça se met à monter => à changer/SAV
Mais sinon, faut pas non plus psychoter 24/24
Et garder à l'esprit que SMART ce n'est pas le miracle : s'il dit qu'il y a un souci, c'est qu'il y a quelque chose à voir/faire et qu'il y a un souci soit corrigeable soit pre-mortem, mais l'inverse est faux : un disque peut cramer sans jamais avoir prévenu dans les rapports SMART
Mon site perso :
Mon PORTDIR_OVERLAY : ou layman -a xwing |
jotake Tux's lil' helper
Joined: 23 Jan 2005 Posts: 102
Posted: Mon Nov 15, 2010 12:03 pm Post subject: |
Ok, merci pour toutes ces petites informations.
Mon souci est que les données contenues sur l'ensemble de ses deux disques sont assez importantes à mes yeux, et je me vois mal les perdre.
J'ai toutefois un backup d'une partie de ceci sur un autre hdd au cas ou, cependant j'envisage sérieusement de me monter un petit NAS, mais j'hésite à me monter une machine à base de raid, ou d'investir dans un NAS synology |
