Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Interesting 2.5/2.6 disk problem
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Reformist
Guru
Guru


Joined: 06 Oct 2002
Posts: 323

PostPosted: Mon Aug 11, 2003 10:42 pm    Post subject: Interesting 2.5/2.6 disk problem Reply with quote

I've recently had to install Gentoo on a new hard disk because of a hard disk crash. Before this reinstallation, I had both 2.4 and 2.6 running without any problems on this hardware (which is an athlon 1.4, Asus a7a266 mobo, and IBM Deskstar 60GB). The IBM crashed, and I replaced it with a 160GB Seagate.

I installed Gentoo on the system without a hitch (go Gentoo!). When I got to the kernel stage, I downloaded both a 2.4 and a 2.6 kernel. I really don't want to use the 2.4 kernel anymore, so I compiled and installed a 2.6 test3 (mm sources).

Booting into the 2.6 from both grub and lilo causes certain programs to be missing in /bin and /sbin; I cannot tell if it is just programs or random files. Anyway, when booting 2.6, _the kernel loads fine_, but the init scripts fail because I'm missing crucial commands such as "awk", "ls", etc.

I compiled and installed 2.4, and everything successfully boots. All of the runlevel scripts execute successfully, and my system is fully usable.

Now this is the kicker and the interesting part, and I'll use "ls" as an example. When booting 2.4 (specifically, gentoo-sources 2.4.20-r5), I can run /bin/ls just fine. When I boot into 2.6, and go straight to a bash prompt (by appending init=/bin/bash to the kernel in lilo), I can browse the /bin directory with "echo /*" and ls is not present! Furthermore, if I explicitly execute /bin/ls, I get an "input/output error". Obviously the program must be there, because if I execute "/bin/asdf", which really does not exist, I get "error, no such file or directory" etc.

The above behavior seems to be isolated to 2.5 and 2.6 kernels; I've tried this with dev sources, mm-sources (2.6-test something) and also 2.5.75-mm sources. I've compiled all of the beta kernels with the same config. I have the standard options compiled in, such as support for my fs (ext3), kernel core set to elf binaries, and support for elf binaries.

This is very bizzare; the beta kernels seem to be mounting my hard drive wrong or something; I can browse it, but certain things seem to be missing (I'm also missing /sbin/lilo under 2.6). Keep in mind, when I boot back into 2.4, everything is present and works well!

I've been asking around everywhere on IRC with not much luck; was wondering if anyone has run into this before or has any suggestions.

Thanks
_________________
-Phil Crosby
Back to top
View user's profile Send private message
handsomepete
Guru
Guru


Joined: 21 Apr 2002
Posts: 548
Location: Kansas City, MO

PostPosted: Mon Aug 11, 2003 11:50 pm    Post subject: Reply with quote

Post your grub.conf and partition layout, that's a good place to start. There was at one point some weirdness with root= naming (it wouldn't accept /dev/hdxX, it had to be a number sequence such as 0303 (which I think I remember reading has changed to a different syntax recently like 03:03) - but I thought that was fixed). Anyways - post those, and maybe we can help.
Back to top
View user's profile Send private message
Reformist
Guru
Guru


Joined: 06 Oct 2002
Posts: 323

PostPosted: Sun Aug 17, 2003 9:28 am    Post subject: Reply with quote

Here are my configs, but I do not think they have any glaring errors in them. Note that my 2.4 and 2.6 boot parameters are exactly the same.
My partion layout is very simple/standard:
Code:

/dev/hda1 - Fat32 windows
/dev/hda2 - ext3 boot
/dev/hda3 - swap
/dev/hda4 - ext3 /


My fstab is as follows:
Code:

...
/dev/hda1     /mnt/win     vfat     noatime 0 0
/dev/hda2     /boot        ext3     noauto,noatime 1 1
/dev/hda3     none         swap     sw 0 0
/dev/hda4     /            ext3     noatime 0 0
none          /proc        proc     defaults
...

I've also tried loading the system with an fstab that defines each drive as /dev/discs/disc0/*, same results.

grub.conf:
Code:

default 1
timeout 30
splashimage=(h0,1)/boot/grub/splash.xpm.g

title=Gentoo Linux 2.6
root (hd0,1)
kernel (hd0,1)/bzImage.2.6.x root=/dev/discs/disc0/part4 vga=795

title=Gentoo Linux 2.4
root (hd0,1)
kernel (hd0,1)/bzImage.2.4.x root=/dev/discs/disc0/part4 vga=795

Note that I've also tried appending the root= /dev/hda4, same results.

So anyway, I reinstalled onto this troublesome partition from scratch; deleted the partition, recreated it, reformatted it (ext3), and installed from stage1. Installation went fine, as expected, just like the first time; upon rebooting from install to the 2.6 kernel, I now get similar errors at different places. My init scripts die just like they did before, but I am missing different files. Specifically, I'm missing /bin/mount, which of course is a show stopper. Actually, I'm not so much missing it, because here is the output after the 2.6 kernel _successfully_ loads:
Code:

* Mounting proc at /proc
modprobe: FATAL: Module binfmt_0000 not found.
modprobe: FATAL: Module binfmt_0000 not found.
7G[ oops ]
* The "mount" command failed with error:
line 1: /bin/mount: cannot execute binary file
* Since this is a critical task, startup cannot continue.


At that point I get the standard "give root password for maintenance" prompt.

I have ruled out this being a hard disk problem, because, a) it is a _brand new_ hard disk (first os on it), and b) I ran fdisk -c -C 0 -d -f /dev/hda4 and there was no error output. Also, keep in mind, this install boots perfectly fine under a 2.4 kernel (as did the previous install, before I did this format). So... I'm kind of at a loss here. On the plus, this is a very interesting problem. No one I've talked to has seen this, and it has to be reproducible on some other system, because the behavior in 2.4 would be undefined if it was a fluke problem.

I do not think it is a kernel config issue either, because I successfully recovered my working 2.6 bzImage from my crashed hard drive, and booted with that on this new hard drive, and encounter the exact same errors on both installations (the previous, and the newly formatted). I have module-init-tools emerged. Also, if I use the repair console after I get this failed boot, and check dmesg, there seem to be no obvious problems logged.

This system should be very similar, if not exactly the same, as my last system on the different hard drive. The only thing I can think of that's different is that now my root partition is 130GB versus 25GB; is there something in the kernel that needed to be configurered that could cause this behavior? I've scoured .config... it seems like the 2.6 kernel is having problems reading my ext3 partition, but that seems impossible, since even when I use the same kernel as my last ext3 harddrive used, I get the same errors. Help!
_________________
-Phil Crosby


Last edited by Reformist on Sun Aug 17, 2003 10:04 pm; edited 1 time in total
Back to top
View user's profile Send private message
handsomepete
Guru
Guru


Joined: 21 Apr 2002
Posts: 548
Location: Kansas City, MO

PostPosted: Sun Aug 17, 2003 12:34 pm    Post subject: Reply with quote

First, in grub try changing root=/dev/discs/etc. to the following (if one doesn't work, try the next one)
root=0304
root=03:04
root=/dev/ide(location of disc in /dev structure - can't remember dir. layout off the top of my head, but start from /dev/ide)

What version of 2.6 are you using? If you're using mm-sources, switch to vanilla and vice versa. If you're using -test3, try -test2, etc. If you've tried them all and still have the problems, take a long look at your kernel .config to make sure there aren't any options in that don't belong there (like SMP on a single processor system). Try taking out everything you don't absolutely need (sound support, extra filesystem support, framebuffer graphics, etc.) and boot off that.

Have you tried installing with a smaller root partition (like 5-10GB)? I don't think it should matter, but who knows?

And just out of curiousity, what happens if you take /proc out of your fstab?
Back to top
View user's profile Send private message
Safrax
Guru
Guru


Joined: 23 Apr 2002
Posts: 422

PostPosted: Sun Aug 17, 2003 4:01 pm    Post subject: Reply with quote

It looks like you forgot to compile ELF binary executable format into the kernel.

In menuconfig, go to "Executable File Formats" select "Kernel support for ELF binaries." Save, and recompile, etc and everything should work.
Back to top
View user's profile Send private message
Reformist
Guru
Guru


Joined: 06 Oct 2002
Posts: 323

PostPosted: Sun Aug 17, 2003 5:46 pm    Post subject: Reply with quote

Kernel core = elf, and elf binary support is enabled. That was the first thing I checkd. I tried making kernel core = a.out, but I get compile errors with that option.
_________________
-Phil Crosby
Back to top
View user's profile Send private message
Safrax
Guru
Guru


Joined: 23 Apr 2002
Posts: 422

PostPosted: Sun Aug 17, 2003 6:30 pm    Post subject: Reply with quote

It could be some sort of a bug in ext3. I've never had much luck with that filesystem. Reiserfs has always worked perfectly for me.

Have you tried this with a filesystem other than ext3? I think that would be the next logical thing to try in this case as it does appear to be some sort of FS error.
Back to top
View user's profile Send private message
To
Veteran
Veteran


Joined: 12 Apr 2003
Posts: 1145
Location: Coimbra, Portugal

PostPosted: Sun Aug 17, 2003 7:14 pm    Post subject: Reply with quote

Safrax wrote:
It could be some sort of a bug in ext3. I've never had much luck with that filesystem. Reiserfs has always worked perfectly for me.

Have you tried this with a filesystem other than ext3? I think that would be the next logical thing to try in this case as it does appear to be some sort of FS error.


I assume that you are talking about ext3 with 2.6?


_________________

------------------------------------------------
Linux Gandalf 3.2.35-grsec
Gentoo Base System version 2.2
------------------------------------------------
Back to top
View user's profile Send private message
Safrax
Guru
Guru


Joined: 23 Apr 2002
Posts: 422

PostPosted: Sun Aug 17, 2003 7:22 pm    Post subject: Reply with quote

Yeah. 2.5/2.6 has had some really nasty bugs when it comes to filesystems. A few weeks ago AKPM released an mm patchset that went nuts on reiserfs systems. I've heard of similar things with ext3 and I've seen comments about the bugginess of ext3 with 2.5/2.6.
Back to top
View user's profile Send private message
arachnotron
n00b
n00b


Joined: 03 Jan 2003
Posts: 10

PostPosted: Sun Aug 17, 2003 9:36 pm    Post subject: Reply with quote

/dev/hda2 /boot ext3 noatuo,noatime 1 1

I may be an idot, but can this be caused by the mispelled 'noauto' parameter ???
Back to top
View user's profile Send private message
Reformist
Guru
Guru


Joined: 06 Oct 2002
Posts: 323

PostPosted: Sun Aug 17, 2003 10:03 pm    Post subject: Reply with quote

Quote:

/dev/hda2 /boot ext3 noatuo,noatime 1 1


Heheh, sorry, I typed that fstab out while looking at another monitor (I'm posting from my laptop). I will fix that in the original post.

Today and tommorow I'm going to compress my system, put the tarball on a network share, reformat the HD with reiserfs (as above posts suggested), and uncompress the tarball back onto this system. Then we shall see if it is an ext3 bug in the kernel.
_________________
-Phil Crosby
Back to top
View user's profile Send private message
Reformist
Guru
Guru


Joined: 06 Oct 2002
Posts: 323

PostPosted: Mon Aug 18, 2003 9:09 am    Post subject: Reply with quote

Whew... compressed my entire / partition (minus a few virtual directories) into a tarball, put it on a network share, converted the partition to reiserfs, pulled the tarball back onto /, expanded, rebooted.......... and success. Praise God! I finally have a working 2.6 system... man did I learn a _ton_ of info while trying to work around this problem...

So yes, it seems like it was an obscure ext3 2.6 problem... I hope that doesn't get shipped with the final release of the 2.6 kernel, because for someone who doesn't have a lot of time or a little bit of prerequisite knowledge, that could have been tragic. Oh, and crap, is reiser3 fast!! I've been using ext3 for stability reasons for the past 2 years, but wow, reiser3 flies!

So currently I'm installing the 134 packages it takes to get from stage2 to gnome =) I had a little problem booting into 2.4 and 2.6, where every single service would spit out "could not locate dependency info for [insert service name here]", and running /sbin/depscan.sh yielded
Code:

gawk: relocation error: /lib/rcscripts/filefuncs.so: undefined symbol r_dupnode

This appeared in different places on the forums, but no one seemed to have an exact solution for it. The problem was that each service was missing something in its dependency information, yet even with all those error messages, all the services started fine. The source of the problem was I had forgotten to add /etc/init.d/net.eth0 to a runlevel with rc-update. My fault entirely, but some more useful error info could have been provided (like, possibly, which dependent service failed?)

But, finally, I can move on from this dreadful fs problem. Thanks to everyone who helped, and especially those who suggested I try reiserfs to see if that remedied the problem: it did!
_________________
-Phil Crosby
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum