Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Filesystem corruption using ext3 & md driver
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
hades
n00b
n00b


Joined: 30 Nov 2002
Posts: 3
Location: Australia

PostPosted: Sat Nov 30, 2002 1:26 am    Post subject: Filesystem corruption using ext3 & md driver Reply with quote

I have recently impletemented Raid 1 using the standard kernel md driver. I am having problems with directories suddenly becomming corrupt.

By corrupt it mean that when I ls (or any other file op) I get xxx file not found for each file (with the xxx being the filename). It seems to affect a whole directory at a time, with the rest of the filesystem appearing fine.

As these are ext3 partitions, I have tried touching /forcefsck so that a complete fsck is done. This returned no issues. I then copied off what I could and recreated the fs, copied data back. This fixed the issue, for a while.

Two days later, the problem came back this time on the root fs. I bit the bullet and reinstalled gentoo from stage1.

Again, everything was dandy for a day or too. Today it's back again. This time I tried running debugfs on the affending fs. What I noticed that that I can cd the directory, and can even cat one the files (it happens to be /usr/include/linux this time). But when I ls from the shell, I get the same xxx file not found.

This is how I have my box setup
- Eden ITX-800 motherboard with 2 80GB HDs, each the master on the prim/sec channels
- md for raid 1 is compiled into the kernel, not a module
- all partitions are type fd to autostart the arrays
- /dev/md0 is /boot
- /dev/md1 is /
- /dev/md2 is /usr2
- /proc/mdstat output is fine, no issues with the array
- kernel is the gentoo-sources, 2.4.19
- flags are -march=i586 -03 -pipe
- gentoo 1.4rc1


This sounds wierd to me. Why is fsck returning no errors when there is an issue? Why can I ls & cat files in debugfs, but not outside? Why do I have corruption when my filesystems have been shutdown nicely (the journal should protect me from this anyway)? Is there something I can do in debugfs to fix the dir?

I have to put it down as a md / kenrel issue bug, as the raid is the onlything that I have introduced that is new.

If you have read this far, thanks :)
Back to top
View user's profile Send private message
hades
n00b
n00b


Joined: 30 Nov 2002
Posts: 3
Location: Australia

PostPosted: Sat Nov 30, 2002 4:40 am    Post subject: Update Reply with quote

Using debugfs I have tracked the problem to the inode flags displayed with the stat <dir> command.

Normal dirs have 0x0, where as the "problem" ones have 0x1000.

Changing the flags to 0x0 fixes the directory, but some get changed straight back. Weird!!!

I have done some searching to find out that the flags mean, but no luck yet. All I know is they are for extended functionality & can get listed /changed with lsattr & chattr, but lsattr does not show anything of interest :?

Next step is to compile a vanilla kernel to see if the gentoo one is the problem....
Back to top
View user's profile Send private message
edcjones
n00b
n00b


Joined: 04 Jul 2002
Posts: 60

PostPosted: Tue Dec 03, 2002 3:49 am    Post subject: Reply with quote

I have the same problem but I don't use raid. "dumpe2fs" shows that my ext3 partitions all have the needs_recovery flag set. If I mount the ext3 partitions as ext2, things are better. See https://forums.gentoo.org/viewtopic.php?t=24848
_________________
Python, Swig & computer vision
Back to top
View user's profile Send private message
hades
n00b
n00b


Joined: 30 Nov 2002
Posts: 3
Location: Australia

PostPosted: Tue Dec 03, 2002 9:54 am    Post subject: Vanilla kernel did the trick Reply with quote

well, I compiled a vanilla kernel & the problem vanished. Looks like the plain jane souces for me.

Still would like to know what the 0x1000 exended attribute means.


Hades
Back to top
View user's profile Send private message
tytso
n00b
n00b


Joined: 04 Dec 2002
Posts: 2

PostPosted: Wed Dec 04, 2002 3:23 am    Post subject: Bad htree patch. Reply with quote

It sounds like the gentoo kernel has an early version of the htree patches that is corrupting directories. My guess is that it's the fencepost bug when splitting a node.

An updated set of kernel patches can be found here:

http://thunk.org/tytso/linux/extfs-2.4-update

The 2.4.20-rc1 patches are missing one or two minor bug fixes that are in the 2.5 code base (I'll get them updated versus 2.4.20 when I have a moment), but they should work a whole lot better than what gentoo is currently using.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum