View previous topic :: View next topic |
Author |
Message |
GenKreton l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/32209115435451019fdb3.jpg)
Joined: 20 Sep 2003 Posts: 828 Location: Cambridge, MA
|
Posted: Tue Oct 18, 2005 1:14 am Post subject: harddrive dma reset issues |
|
|
Code: | hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: DMA disabled
ide0: reset: success |
I get this very often on my laptop. Sometimes it happens quickly, sometimes it can take hours (maybe even days) before this happens. Some addidtional information:
Code: |
aesir ~ # hdparm -I /dev/hda
/dev/hda:
ATA device, with non-removable media
Model Number: IC25N030ATMR04-0
Serial Number: MRG2E0KBFAMKRJ
Firmware Revision: MOAOAD0A
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 3a
Supported: 6 5 4 3
Configuration:
Logical max current
cylinders 16383 65535
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 4128705
LBA user addressable sectors: 58605120
LBA48 user addressable sectors: 58605120
device size with M = 1024*1024: 28615 MBytes
device size with M = 1000*1000: 30005 MBytes (30 GB)
Capabilities:
LBA, IORDY(can be disabled)
bytes avail on r/w long: 4 Queue depth: 1
Standby timer values: spec'd by Vendor, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 128 (0x80)
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=240ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* NOP cmd
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* Look-ahead
* Write cache
* Power Management feature set
Security Mode feature set
* SMART feature set
* FLUSH CACHE EXT command
* Mandatory FLUSH CACHE command
* Device Configuration Overlay feature set
* 48-bit Address feature set
Automatic Acoustic Management feature set
SET MAX security extension
Address Offset Reserved Area Boot
* SET FEATURES subcommand required to spinup after power up
Power-Up In Standby feature set
* Advanced Power Management feature set
* General Purpose Logging feature set
* SMART self-test
* SMART error logging
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
26min for SECURITY ERASE UNIT.
HW reset results:
CBLID- above Vih
Device num = 0 determined by the jumper
Checksum: correct
|
Also my default hdparm settings are as follows (feel free to correct them even if they are not the cause of the problem):
Code: | /dev/hda:
multcount = 16 (on)
IO_support = 0 (default 16-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 64 (on)
geometry = 16383/255/63, sectors = 30005821440, start = 0
|
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
GenKreton l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/32209115435451019fdb3.jpg)
Joined: 20 Sep 2003 Posts: 828 Location: Cambridge, MA
|
Posted: Tue Oct 18, 2005 5:50 pm Post subject: |
|
|
If nobody can help, does anyone else know where this would be appropriate to get help on? I was leaning towards the kernel guys but I'm not sure. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
widan Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/142533236243680bda6f27b.jpg)
Joined: 07 Jun 2005 Posts: 1512 Location: Paris, France
|
Posted: Tue Oct 18, 2005 7:02 pm Post subject: Re: harddrive dma reset issues |
|
|
GenKreton wrote: | Code: | hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: DMA disabled
ide0: reset: success |
I get this very often on my laptop. Sometimes it happens quickly, sometimes it can take hours (maybe even days) before this happens. |
What happens is that your disk fails to respond in time to a command sent to it. After some time, Linux decides the drive is probably confused and resets the IDE bus. The reason it also disables DMA is that drives can become confused from a too high UDMA setting, and disabling DMA will make them work again (even if it is unlikely to be the case for you).
Are you trying to put the drive in standby mode for low power ? A drive in standby mode won't respond to ATA commands, and will need an IDE bus reset before it comes back to life. You can try to see what causes the errors with smartctl (emerge smartmontools if you don't have it):
Code: | smartctl -l error /dev/hda |
If the list contains "STANDBY" or "STANDBY IMMEDIATE", then something asked the drive to go to standby mode. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
GenKreton l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/32209115435451019fdb3.jpg)
Joined: 20 Sep 2003 Posts: 828 Location: Cambridge, MA
|
Posted: Tue Oct 18, 2005 7:43 pm Post subject: |
|
|
I emerged it now, I guess I'll wait and check till next time I see it off
Also, I saw this in my hdparm output.
Configuration:
Logical max current
cylinders 16383 65535
Is it bad that current is so much bigger than max? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
widan Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/142533236243680bda6f27b.jpg)
Joined: 07 Jun 2005 Posts: 1512 Location: Paris, France
|
Posted: Tue Oct 18, 2005 8:43 pm Post subject: |
|
|
GenKreton wrote: | I emerged it now, I guess I'll wait and check till next time I see it off. |
You can run it now. The drive stores the last 5 or so errors in its internal memory.
GenKreton wrote: | Is it bad that current is so much bigger than max? |
No. Linux only uses LBA addressing. CHS drive geometry means nothing for modern drives, only the LBA values matter. The reason the values are set like they are is compatibility with some BIOSes. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
GenKreton l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/32209115435451019fdb3.jpg)
Joined: 20 Sep 2003 Posts: 828 Location: Cambridge, MA
|
Posted: Tue Oct 18, 2005 9:16 pm Post subject: |
|
|
Code: | aesir ~ # smartctl -l error /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged
|
And I ran
Code: | smartctl -a /dev/hda
[remove snippets]
Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
|
That's the only thing I could find wrong. I ran the long test as well and that reported no errors. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
widan Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/142533236243680bda6f27b.jpg)
Joined: 07 Jun 2005 Posts: 1512 Location: Paris, France
|
Posted: Tue Oct 18, 2005 9:24 pm Post subject: |
|
|
Thinking of it again, it's "normal" that no error was logged. The drive (obviously) won't store an error record when it is in standby (assuming that's the cause)... In any case, it seems the drive is fine. You can try to disable any power management tools you might have loaded, and see if the "problem" disappears. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
dashnu l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 21 Jul 2004 Posts: 703 Location: Casco Maine
|
Posted: Mon Nov 14, 2005 1:59 pm Post subject: |
|
|
I have the same issue. kernel-2.6.13 gentoo-sources.
Similar Drive:
Code: |
ATA device, with non-removable media
powers-up in standby; SET FEATURES subcmd spins-up.
Model Number: IC35L060AVV207-0
Serial Number: VNVB30G8UL5JXH
Firmware Revision: V22OA66A
Standards:
Used: ATA/ATAPI-6 T13 1410D revision 3a
Supported: 6 5 4 3
Configuration:
Logical max current
cylinders 16383 65535
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 4128705
LBA user addressable sectors: 78156288
LBA48 user addressable sectors: 78156288
device size with M = 1024*1024: 38162 MBytes
device size with M = 1000*1000: 40016 MBytes (40 GB)
|
smart reports are ok.
This happens when my server is under a lot of load.
I do not have any power management set up.
I have tried a few kernels also with no luck. _________________ write quit bang |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
SweD n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 08 Jun 2004 Posts: 7
|
Posted: Fri Nov 18, 2005 3:00 pm Post subject: |
|
|
I had this exakt same problem. What solved it for me was that I started noticing that my harddrives were running at a temp of approx 40-45 Celsius. Subjecting them to something intensive brought them up to the 50-55 mark. I kept having these DMA resets all the time, especially when I put more than one disk on the same channel of my pdc20265 onboard promise controller. I don't mean litterally all the time, but if the machine had been on for some time, these errors kept popping up more and more often.
I've since bought a new, substantially bigger chassi, put in extra fans to make sure thermal issues wouldn't be part of the problem, and presto.
My harddrives now run at 25-30 Celsius, and I have not had a single dma reset since. I mean that litterally, not a single one, and I've been using the new setup for approx 2 months, daily.
The conclusion can't be a general one, I'm sure, but for me it was a thermall issue through and through.
I had this issue ever since I put a second drive in, using something like gentoo-sources-2.4.20. I kept reading the osdl bugreports, specifically bug¤# 2494, where many people had the same issues, thinking it would be flaky drivers for the promise controller. It was, to some extent, but after they were "fixed", in some kernel version or another, the problems remained, until I fixed the temp.
Just my story, milage may well differ, of course.
Regards,
/Dennis |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
GenKreton l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/32209115435451019fdb3.jpg)
Joined: 20 Sep 2003 Posts: 828 Location: Cambridge, MA
|
Posted: Sat Dec 10, 2005 1:40 am Post subject: |
|
|
Thank you swed. Though I am using the 2.6 series it seems very possible it is the same problem. It only occured under high harddrive load situations and it is a laptop - so I lack the same ability to expand my chasis though I will try to take more efforts to keep it cool and see how it turns out. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
SweD n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 08 Jun 2004 Posts: 7
|
Posted: Wed Dec 21, 2005 8:09 pm Post subject: |
|
|
Just for completeness, and to be clear, I'm also using 2.6, and have had this problem with 2.6 for a long time. The 2.4 reference was to point out that my problems started a looong time ago ![Very Happy :-D](images/smiles/icon_biggrin.gif) |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|