Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
I think my hard drive is dieing...
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Fri Jan 01, 2010 11:28 pm    Post subject: I think my hard drive is dieing... Reply with quote

This morning when after I logged into my desktop I started hearing this little "werring" sound going on and off every now and then. After I turned my computer off, I can no longer boot to that hard drive. I get read errors and the like. Now I am booted into the system rescue cd and am currently using rsync to copy all of my important files to my netbook. The drive mounts fine, the rsync operation is working perfectly. I haven't run a filesystem check or anything just in case there is a chance of something going wrong and I lose something. Is there anything that I should know? Anyone have any advice?

The worst part of this whole failure is that I just bought a 1 TB drive for my other computer to host backups the day before this all happened. I have never done backups before simply because I never had anything to put them on. Just very rotten luck on my part.
Back to top
View user's profile Send private message
Ken69267
Developer
Developer


Joined: 08 Apr 2007
Posts: 111
Location: #gentoo-pr0n

PostPosted: Sat Jan 02, 2010 12:06 am    Post subject: Reply with quote

well, I'd backup everything that you can and then asking what S.M.A.R.T. reports. sys-apps/smartmontools is the smart package in gentoo and `smartctl -a /dev/yourdevice` will tell you a good bit of info about it.

(I believe the system rescue cd has smart on it)
_________________
!snack
Back to top
View user's profile Send private message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Sat Jan 02, 2010 1:01 am    Post subject: Reply with quote

I'm not really sure what to make of the output.
Code:
root@sysresccd /root % smartctl -a /dev/sda
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3250310AS
Serial Number:    6RY38ZFQ
Firmware Version: 3.AAC
User Capacity:    250,059,350,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jan  1 19:00:53 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:        ( 430) seconds.
Offline data collection
capabilities:           (0x5b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     (  64) minutes.
SCT capabilities:           (0x0001)   SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   108   097   006    Pre-fail  Always       -       19967651
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       937
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   083   062   030    Pre-fail  Always       -       224139541
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10823
 10 Spin_Retry_Count        0x0013   100   099   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       928
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   060   050   045    Old_age   Always       -       40 (Lifetime Min/Max 38/40)
194 Temperature_Celsius     0x0022   040   050   000    Old_age   Always       -       40 (0 23 0 0)
195 Hardware_ECC_Recovered  0x001a   079   060   000    Old_age   Always       -       1658
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10820         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Back to top
View user's profile Send private message
Vorlon
Apprentice
Apprentice


Joined: 16 May 2003
Posts: 263
Location: East Earl, PA

PostPosted: Sat Jan 02, 2010 1:02 am    Post subject: Reply with quote

I think that disk is hosed. It probably has a bad boot sector, which is why you can access it but not boot to it.


Get as much off of it as you can.

if you're very brave (foolish?) you can try to repartition it after you have gotten all your data, but I'd get rid of the disk.
_________________
Casey Bralla
Chief Nerd in Residence
The NerdWorld Organisation
Back to top
View user's profile Send private message
Ken69267
Developer
Developer


Joined: 08 Apr 2007
Posts: 111
Location: #gentoo-pr0n

PostPosted: Sat Jan 02, 2010 1:10 am    Post subject: Reply with quote

it doesn't look like a mechanical failure tbh. It reports PASSED and the attributes are all sane. (if the VALUE value is less than or equal to THRESH its failing)

After you've backed up might try to scan for badblocks and fsck.

EDIT: I'm almost positive its a badblock as
Code:
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1


See that 1? That means one block needs to reallocated.

`smartctrl -t short` would tell you approximately where it is. I had this happen to me recently but luckily it was in my swap partition and I could just dd the block away.
_________________
!snack
Back to top
View user's profile Send private message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Sat Jan 02, 2010 1:23 am    Post subject: Reply with quote

I think I've backed everything up that I need including my make.conf and my kernel's .config. I am brave and foolish, so I think I'll just try a reinstall and see what happens.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54850
Location: 56N 3W

PostPosted: Sat Jan 02, 2010 1:27 am    Post subject: Reply with quote

WastingBody,

Get ddrescue (without the hypen) and manke an image of the drive to a file on your nice shiny new 1Tb drive
ddrescue tries really hard to recover your data and will only halt on success.

The SMART data looks ok and your drive only has just over 10,000 running hours, so its not even middle aged.

When you have your image, run the vendors test software on the disk. Thats a download from the website.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Sat Jan 02, 2010 2:20 am    Post subject: Reply with quote

I'm running the SeaTools diagnostics tool from Seagate. I'm running the long test; it has found 10 errors so far. I hope it fixes it and I won't be left with a broken hard drive. On a good note my drive is still covered under its warranty, so if it does completely die soon I can have it replaced.

I think I'll keep ddrescue in mind if something else happens. I'll skip on it this time because all that's left that I haven't backed up is my root partition. I can easily rebuild my Gentoo install.
Back to top
View user's profile Send private message
Ken69267
Developer
Developer


Joined: 08 Apr 2007
Posts: 111
Location: #gentoo-pr0n

PostPosted: Sat Jan 02, 2010 2:25 am    Post subject: Reply with quote

I doubt the drive is totaled, it's got better vitality attributes than my drives at the least :P.

Going on 150 reallocated bad blocks on mine, yours has zero (soon to be one once you fix the current one).
_________________
!snack
Back to top
View user's profile Send private message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Sat Jan 02, 2010 3:39 am    Post subject: Reply with quote

SeaTools reported that it had fixed the bad blocks. I try booting to my drive. It runs a fsck on my home partition, but it fails saying something about unexpected inconsistency and dumps me out to the console with a read-only filesystem. Should I try to fsck from the system rescue cd? If I do what options should I use to check with if any?

On another note large file support was causing me a little pain on new server. I was unable to mount my ext4 partitions for a little while because of the default options of mkfs.ext4.

I'm editing this post from my now alive desktop. Thanks everyone!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54850
Location: 56N 3W

PostPosted: Sat Jan 02, 2010 12:47 pm    Post subject: Reply with quote

WastingBody,

Those 10 errors will be bad blocks that Seatools forced to be relocated by writing to the blocks.
Your data that was there will be lost. Smartmontools should show a non zero reallocated sector count.

Its a little worrying that the drive does not reallocate sectors while it can still read them, as its supposed to.
Its worth poking Seagate about that, as this problem will happen again.
This is the second incident like this, concerning a Seagate drive on the forums over the past week or so.

Seagate probably know about it and there may be a firmware upgrade for your drive to fix it.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Sat Jan 02, 2010 1:49 pm    Post subject: Reply with quote

There were no firmware updates for my drive. :/

If this problem progresses would it be a wise decision to go ahead and replace the drive?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54850
Location: 56N 3W

PostPosted: Sat Jan 02, 2010 2:25 pm    Post subject: Reply with quote

WastingBody,

email Seagate about the issue. Sectors do die as the drive is used and the drive is supposed to remap them to spares while it can still read the data. The issue will recur.

I would press seagate for a warranty replacement now - before you lose any more data.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
WastingBody
Tux's lil' helper
Tux's lil' helper


Joined: 09 May 2008
Posts: 105

PostPosted: Sat Jan 02, 2010 2:58 pm    Post subject: Reply with quote

Alright, thanks again. ^_^
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum