View previous topic :: View next topic |
Author |
Message |
luciano Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 18 Nov 2004 Posts: 132
|
Posted: Wed Dec 08, 2004 8:58 pm Post subject: dma timeouts |
|
|
Hi All,
Today when I got home my fileserver was a bit chuggy so I unmounted my 160 GB shared drive. Now running a check on it says it's corrupt, and I'm unable to fix it . If ANYONE can help me out I'd be incredibly thankful!
I'm running 2.6.10-rc2-mm2 on an old athlon. The disk in question has a single reiser4 partition. fsck.reiser4 complains about superblock magic numbers, but can't seem to fix anything due to i/o errors. This makes me think it's more of a hardware problem.
The fact that I'm also getting dma timeout errors from the kernel/system loggers seem to also point to this. I get messages like this:
Code: |
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 63128031
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=63131479, high=3, low=12799831, sector=63131479
|
when I try to mount, kernel says:
Code: | reiser4[mount(5341)]: _init_read_super (fs/reiser4/init_super.c:198)[nikita-2608]:
WARNING: hdc1: wrong master super block magic.
|
and when I try to fix it with fsck, it complains:
Code: |
***** fsck.reiser4 started at Wed Dec 8 20:35:31 2004
Fatal: Wrong magic found in the master super block.
Master super block cannot be found. Do you want to build a new one on
(/dev/hdc1)?
(Yes/No): yes
Which block size do you use? [4096]:
Warn : A new master superblock is created on (/dev/hdc1).
Error: Can't find disk-format plugin by its id 0xffff.
Error: Cannot open the on-disk format on (/dev/hdc1)
Info : The format 'format40' is detected. Rebuilding with it.
Error: Can't read bitmap block 4943136. Input/output error.
Error: Can't load ondisk bitmap.
Error: Can't initialize block allocator.
Fatal: Failed to open the block allocator.
|
I've checked other possible causes: I've changed the IDE cables to new ones and checked teh connections. My server is a bit hot at times, but I'd think that after cooling it properly it shouldn't be gettin i/o errors. The disk is brand new, and my primary disk that's running in the same box doesn't have any problems (it's also reiser4).
I haven't tried moving the disk to my other machine, but I'll attempt that if noone can think of anything eles! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
/dev/random l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/19399785604be4823ea8b41.jpg)
Joined: 26 Nov 2004 Posts: 704 Location: Austin, Texas, USA
|
Posted: Thu Dec 09, 2004 3:33 am Post subject: |
|
|
My educated guess would be it's probably reiser4. However, it's possible that your IDE controller is bad, but it sounds more likely its the fault of reiser4. I recommend you don't use reiser4 on a server until its stable. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
luciano Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 18 Nov 2004 Posts: 132
|
Posted: Thu Dec 09, 2004 1:52 pm Post subject: |
|
|
/dev/random wrote: | However, it's possible that your IDE controller is bad |
you mean the IDE controller on the motherboard? If I stuck the disk in my other machine (and the problem was the controller), it should work then, no? I'm going to try this tonight.
What I find strangest is that the reiser4progs can't fix the problem. If it was an issue with reiser 4, then I should at least be able to fix the partition, I think. Obviously this doesn't mean it wont' corrupt again.
Maybe I should have mentioned I was running NFS on top of it. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
iulianpojar n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 17 May 2004 Posts: 38 Location: Moldova
|
Posted: Thu Dec 09, 2004 10:16 pm Post subject: |
|
|
Hey did you check the power connectors ? I usualy get dma timeout because of them ( i have a lot of diferent harddiscks that i have to change and when i take the power connector from one and put it to another it ussualy baddly pluginns , this happens because harrds from diferent brands have power connector pinns of diferent diameters). ![Very Happy :D](images/smiles/icon_biggrin.gif) |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
luciano Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 18 Nov 2004 Posts: 132
|
Posted: Fri Dec 10, 2004 10:43 am Post subject: |
|
|
Hi All,
Thanks for all your responses. I seem to have found the problem. I tried mounting the disk on another machine and I get the same problem. So I installed the really useful SMARTmontools (emerge smartmontools) and ran a few tests.
It turns out I seem to have a corrupt sector on the boot record and/or superblock! This is REALLY bad news , and I'm probably going to loose a bunch of data , even if I can rebuild the filesystem.
I don't understand how this could happen! I'm confused as to why you can have a corrupt sector on your disk .. someone tell me if this is right:
each sector has some sort of a checksum after it. If the checksum doesn't match, then the sector is considered corrupt. So this doesn't necessarily mean that there's a hardware problem, but that maybe the disk just shut down in the middle of writing a block..
I can think of this as the only explanation- as I said, the disk is a brand new Seagate (one month old), so I'm considering whether to claim on the warranty.. unless I can be sure that it's not a hardware problem!
Once again, thank you all for your invaluable input! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|