View previous topic :: View next topic |
Author |
Message |
hades n00b
Joined: 30 Nov 2002 Posts: 3 Location: Australia
|
Posted: Sat Nov 30, 2002 1:26 am Post subject: Filesystem corruption using ext3 & md driver |
|
|
I have recently impletemented Raid 1 using the standard kernel md driver. I am having problems with directories suddenly becomming corrupt.
By corrupt it mean that when I ls (or any other file op) I get xxx file not found for each file (with the xxx being the filename). It seems to affect a whole directory at a time, with the rest of the filesystem appearing fine.
As these are ext3 partitions, I have tried touching /forcefsck so that a complete fsck is done. This returned no issues. I then copied off what I could and recreated the fs, copied data back. This fixed the issue, for a while.
Two days later, the problem came back this time on the root fs. I bit the bullet and reinstalled gentoo from stage1.
Again, everything was dandy for a day or too. Today it's back again. This time I tried running debugfs on the affending fs. What I noticed that that I can cd the directory, and can even cat one the files (it happens to be /usr/include/linux this time). But when I ls from the shell, I get the same xxx file not found.
This is how I have my box setup
- Eden ITX-800 motherboard with 2 80GB HDs, each the master on the prim/sec channels
- md for raid 1 is compiled into the kernel, not a module
- all partitions are type fd to autostart the arrays
- /dev/md0 is /boot
- /dev/md1 is /
- /dev/md2 is /usr2
- /proc/mdstat output is fine, no issues with the array
- kernel is the gentoo-sources, 2.4.19
- flags are -march=i586 -03 -pipe
- gentoo 1.4rc1
This sounds wierd to me. Why is fsck returning no errors when there is an issue? Why can I ls & cat files in debugfs, but not outside? Why do I have corruption when my filesystems have been shutdown nicely (the journal should protect me from this anyway)? Is there something I can do in debugfs to fix the dir?
I have to put it down as a md / kenrel issue bug, as the raid is the onlything that I have introduced that is new.
If you have read this far, thanks |
|
Back to top |
|
|
hades n00b
Joined: 30 Nov 2002 Posts: 3 Location: Australia
|
Posted: Sat Nov 30, 2002 4:40 am Post subject: Update |
|
|
Using debugfs I have tracked the problem to the inode flags displayed with the stat <dir> command.
Normal dirs have 0x0, where as the "problem" ones have 0x1000.
Changing the flags to 0x0 fixes the directory, but some get changed straight back. Weird!!!
I have done some searching to find out that the flags mean, but no luck yet. All I know is they are for extended functionality & can get listed /changed with lsattr & chattr, but lsattr does not show anything of interest
Next step is to compile a vanilla kernel to see if the gentoo one is the problem.... |
|
Back to top |
|
|
edcjones n00b
Joined: 04 Jul 2002 Posts: 60
|
Posted: Tue Dec 03, 2002 3:49 am Post subject: |
|
|
I have the same problem but I don't use raid. "dumpe2fs" shows that my ext3 partitions all have the needs_recovery flag set. If I mount the ext3 partitions as ext2, things are better. See https://forums.gentoo.org/viewtopic.php?t=24848 _________________ Python, Swig & computer vision |
|
Back to top |
|
|
hades n00b
Joined: 30 Nov 2002 Posts: 3 Location: Australia
|
Posted: Tue Dec 03, 2002 9:54 am Post subject: Vanilla kernel did the trick |
|
|
well, I compiled a vanilla kernel & the problem vanished. Looks like the plain jane souces for me.
Still would like to know what the 0x1000 exended attribute means.
Hades |
|
Back to top |
|
|
tytso n00b
Joined: 04 Dec 2002 Posts: 2
|
Posted: Wed Dec 04, 2002 3:23 am Post subject: Bad htree patch. |
|
|
It sounds like the gentoo kernel has an early version of the htree patches that is corrupting directories. My guess is that it's the fencepost bug when splitting a node.
An updated set of kernel patches can be found here:
http://thunk.org/tytso/linux/extfs-2.4-update
The 2.4.20-rc1 patches are missing one or two minor bug fixes that are in the 2.5 code base (I'll get them updated versus 2.4.20 when I have a moment), but they should work a whole lot better than what gentoo is currently using. |
|
Back to top |
|
|
|