Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
SATA diagnoses (+ 2nd drive on Shuttle SD37P2)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
iaw
Tux's lil' helper
Tux's lil' helper


Joined: 20 Dec 2004
Posts: 81

PostPosted: Wed Oct 11, 2006 6:34 pm    Post subject: SATA diagnoses (+ 2nd drive on Shuttle SD37P2) Reply with quote

dear linux wizards:

I am running a shuttle sd37p2. with one samsung SATA 400gb drive,
the system is rock-steady.

I have two problems.

[1] I want to learn what hard drive speed I am getting, and why my
second identical drive is giving me trouble. on /dev/hda devices,
hdparm -i gave some neat info. sdparm output seems rather spare
in comparison. I also would like to learn more about DMA vs. PIO
modes, and I would like to learn the SMART status of my device. How?

[2] A second hard drive in the system fails. that is, after heavy disk use,
there is suddenly some IO error, then there are suddenly a lot of kernel
messages about errors, and the drive disconnects. the drive itself is the
same as the #1 drive and works perfectly in an external USB enclosure.

so, the most likely problems are now that either the linux SATA driver is
unstable (which would surprise me), or that the sd37p2 has a problem
on the second SATA channel--but this is weird, too. A standard problem
rather than the "after much use" problem is more often software than
hardware. are there any known kernel issues under 2.6.18 x86_64 (gcc-4.1.1) ?

help appreciated.

sincerely,

/ivo welch
_________________
/iaw
Back to top
View user's profile Send private message
radagast
Apprentice
Apprentice


Joined: 20 Mar 2004
Posts: 217
Location: sydney, .au

PostPosted: Thu Oct 12, 2006 7:28 am    Post subject: Reply with quote

in answer to (2),
i have a 120G seagate drive which i was using in a raid array with another identical drive. the array was crashing about once a month, IO error and kernel messages (i could dig them up if you want to compare), and eventually i retired the dodgy drive to a backup, where it spins along quite happily.

if both your drives are SATA, it's very unlikely to be the driver. switching the drive channels is pretty easy too. and to test the kernel you could install another kernel - maybe even a generic x86 one - and run something to see if it crashes overnight.

in my experience though, it was the disk.
Back to top
View user's profile Send private message
iaw
Tux's lil' helper
Tux's lil' helper


Joined: 20 Dec 2004
Posts: 81

PostPosted: Thu Oct 12, 2006 1:52 pm    Post subject: Reply with quote

thanks. is there any way to learn the smart status or pio/dma usage on the sata channel?
_________________
/iaw
Back to top
View user's profile Send private message
radagast
Apprentice
Apprentice


Joined: 20 Mar 2004
Posts: 217
Location: sydney, .au

PostPosted: Thu Oct 12, 2006 3:18 pm    Post subject: Reply with quote

the only success i've ever had with SMART on windows or linux is through the bios. the drive i described in my last post had no warning, even though i can reproduce the failure.
trouble with SMART is, you never know it's working until it tells you it is.

hdparm -tT works fine for sd devices, and that's all you need to compare speeds.
Back to top
View user's profile Send private message
wynn
Advocate
Advocate


Joined: 01 Apr 2005
Posts: 2421
Location: UK

PostPosted: Thu Oct 12, 2006 4:06 pm    Post subject: Reply with quote

If you run smartd (from smartmontools) then it will post to the kernel log when it starts up and when any critical attribute changes, this includes temperature.

The entries in /etc/smartd.conf here are
Code:
/dev/sda -a -d ata -m <user>@>address>
/dev/sdb -a -d ata -m <user>@>address>
so it will email <user> "if problems are detected".

The sort of output you get in the kernel log is
Code:
Oct 12 13:26:44 lightfoot smartd[8504]: smartd version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Oct 12 13:26:44 lightfoot smartd[8504]: Home page is http://smartmontools.sourceforge.net/
Oct 12 13:26:44 lightfoot smartd[8504]: Opened configuration file /etc/smartd.conf
Oct 12 13:26:44 lightfoot smartd[8504]: Configuration file /etc/smartd.conf parsed.
Oct 12 13:26:44 lightfoot smartd[8504]: Device: /dev/sda, opened
Oct 12 13:26:44 lightfoot smartd[8504]: Device: /dev/sda, found in smartd database.
Oct 12 13:26:45 lightfoot smartd[8504]: Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Oct 12 13:26:45 lightfoot smartd[8504]: Device: /dev/sdb, opened
Oct 12 13:26:45 lightfoot smartd[8504]: Device: /dev/sdb, found in smartd database.
Oct 12 13:26:46 lightfoot smartd[8504]: Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Oct 12 13:26:46 lightfoot smartd[8504]: Monitoring 2 ATA and 0 SCSI devices
Oct 12 13:26:47 lightfoot smartd[8506]: smartd has fork()ed into background mode. New PID=8506.
Oct 12 13:26:47 lightfoot smartd[8506]: file /var/run/smartd.pid written containing PID 8506
Oct 12 13:56:48 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 203 to 171
Oct 12 13:56:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 166
Oct 12 14:26:48 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166
Oct 12 14:26:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 161
Oct 12 15:56:48 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 171
Oct 12 16:26:47 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166
Oct 12 16:26:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 161 to 157
Oct 12 16:56:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 161
There have been other, rather less innocuous messages
Code:
Oct 11 11:38:39 lightfoot smartd[8492]: Device: /dev/sda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 95 to 94
Not yet serious, running
Code:
# smartctl -a -d ata /dev/sda
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   095   095   060    Pre-fail  Always       -       589831
and
Code:
# smartctl --health -d ata /dev/sda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

_________________
The avatar is jorma, a "duck" from "Elephants Dream": the film and all the production materials have been made available under a Creative Commons Attribution 2.5 License, see orange.blender.org for details.
Back to top
View user's profile Send private message
iaw
Tux's lil' helper
Tux's lil' helper


Joined: 20 Dec 2004
Posts: 81

PostPosted: Fri Oct 13, 2006 7:57 pm    Post subject: Reply with quote

great! mille grazie.

regards,

/iaw
_________________
/iaw
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum