View previous topic :: View next topic |
Author |
Message |
iaw Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 20 Dec 2004 Posts: 81
|
Posted: Wed Oct 11, 2006 6:34 pm Post subject: SATA diagnoses (+ 2nd drive on Shuttle SD37P2) |
|
|
dear linux wizards:
I am running a shuttle sd37p2. with one samsung SATA 400gb drive,
the system is rock-steady.
I have two problems.
[1] I want to learn what hard drive speed I am getting, and why my
second identical drive is giving me trouble. on /dev/hda devices,
hdparm -i gave some neat info. sdparm output seems rather spare
in comparison. I also would like to learn more about DMA vs. PIO
modes, and I would like to learn the SMART status of my device. How?
[2] A second hard drive in the system fails. that is, after heavy disk use,
there is suddenly some IO error, then there are suddenly a lot of kernel
messages about errors, and the drive disconnects. the drive itself is the
same as the #1 drive and works perfectly in an external USB enclosure.
so, the most likely problems are now that either the linux SATA driver is
unstable (which would surprise me), or that the sd37p2 has a problem
on the second SATA channel--but this is weird, too. A standard problem
rather than the "after much use" problem is more often software than
hardware. are there any known kernel issues under 2.6.18 x86_64 (gcc-4.1.1) ?
help appreciated.
sincerely,
/ivo welch _________________ /iaw |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
radagast Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/2460888794070ddf14677f.jpg)
Joined: 20 Mar 2004 Posts: 217 Location: sydney, .au
|
Posted: Thu Oct 12, 2006 7:28 am Post subject: |
|
|
in answer to (2),
i have a 120G seagate drive which i was using in a raid array with another identical drive. the array was crashing about once a month, IO error and kernel messages (i could dig them up if you want to compare), and eventually i retired the dodgy drive to a backup, where it spins along quite happily.
if both your drives are SATA, it's very unlikely to be the driver. switching the drive channels is pretty easy too. and to test the kernel you could install another kernel - maybe even a generic x86 one - and run something to see if it crashes overnight.
in my experience though, it was the disk. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
iaw Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 20 Dec 2004 Posts: 81
|
Posted: Thu Oct 12, 2006 1:52 pm Post subject: |
|
|
thanks. is there any way to learn the smart status or pio/dma usage on the sata channel? _________________ /iaw |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
radagast Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
![](images/avatars/2460888794070ddf14677f.jpg)
Joined: 20 Mar 2004 Posts: 217 Location: sydney, .au
|
Posted: Thu Oct 12, 2006 3:18 pm Post subject: |
|
|
the only success i've ever had with SMART on windows or linux is through the bios. the drive i described in my last post had no warning, even though i can reproduce the failure.
trouble with SMART is, you never know it's working until it tells you it is.
hdparm -tT works fine for sd devices, and that's all you need to compare speeds. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
wynn Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
![](images/avatars/46695354144c509f41a088.png)
Joined: 01 Apr 2005 Posts: 2421 Location: UK
|
Posted: Thu Oct 12, 2006 4:06 pm Post subject: |
|
|
If you run smartd (from smartmontools) then it will post to the kernel log when it starts up and when any critical attribute changes, this includes temperature.
The entries in /etc/smartd.conf here are Code: | /dev/sda -a -d ata -m <user>@>address>
/dev/sdb -a -d ata -m <user>@>address> | so it will email <user> "if problems are detected".
The sort of output you get in the kernel log is Code: | Oct 12 13:26:44 lightfoot smartd[8504]: smartd version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Oct 12 13:26:44 lightfoot smartd[8504]: Home page is http://smartmontools.sourceforge.net/
Oct 12 13:26:44 lightfoot smartd[8504]: Opened configuration file /etc/smartd.conf
Oct 12 13:26:44 lightfoot smartd[8504]: Configuration file /etc/smartd.conf parsed.
Oct 12 13:26:44 lightfoot smartd[8504]: Device: /dev/sda, opened
Oct 12 13:26:44 lightfoot smartd[8504]: Device: /dev/sda, found in smartd database.
Oct 12 13:26:45 lightfoot smartd[8504]: Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Oct 12 13:26:45 lightfoot smartd[8504]: Device: /dev/sdb, opened
Oct 12 13:26:45 lightfoot smartd[8504]: Device: /dev/sdb, found in smartd database.
Oct 12 13:26:46 lightfoot smartd[8504]: Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Oct 12 13:26:46 lightfoot smartd[8504]: Monitoring 2 ATA and 0 SCSI devices
Oct 12 13:26:47 lightfoot smartd[8506]: smartd has fork()ed into background mode. New PID=8506.
Oct 12 13:26:47 lightfoot smartd[8506]: file /var/run/smartd.pid written containing PID 8506
Oct 12 13:56:48 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 203 to 171
Oct 12 13:56:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 166
Oct 12 14:26:48 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166
Oct 12 14:26:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 161
Oct 12 15:56:48 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 166 to 171
Oct 12 16:26:47 lightfoot smartd[8506]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 171 to 166
Oct 12 16:26:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 161 to 157
Oct 12 16:56:48 lightfoot smartd[8506]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 161 | There have been other, rather less innocuous messages Code: | Oct 11 11:38:39 lightfoot smartd[8492]: Device: /dev/sda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 95 to 94 | Not yet serious, running Code: | # smartctl -a -d ata /dev/sda
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 095 095 060 Pre-fail Always - 589831 | and Code: | # smartctl --health -d ata /dev/sda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED |
_________________ The avatar is jorma, a "duck" from "Elephants Dream": the film and all the production materials have been made available under a Creative Commons Attribution 2.5 License, see orange.blender.org for details. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
iaw Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 20 Dec 2004 Posts: 81
|
Posted: Fri Oct 13, 2006 7:57 pm Post subject: |
|
|
great! mille grazie.
regards,
/iaw _________________ /iaw |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|