Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
drive died, booting slowed, now can't boot to Gentoo[SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
dcljr
Tux's lil' helper
Tux's lil' helper


Joined: 20 Aug 2005
Posts: 139
Location: Austin, TX

PostPosted: Thu Jun 08, 2006 9:43 am    Post subject: drive died, booting slowed, now can't boot to Gentoo[SOLVED] Reply with quote

I have a sad story to tell. Bear with me since I don't know how much of this is actually relevant to my problem.

About two weeks ago, one of my hard drives (a 20G Western Digital, c. 6-7 years old) appeared to die -- it started making clicking sounds, as if it was repeatedly shutting down and immediately reviving. I more or less panicked and shut down my system (gracefully). When I rebooted, the drive continued to make weird noises, so I shut down again (this time, I think I just hit the power button mid-boot). I booted to a Knoppix Live CD and saw for the first time which drive was actually having trouble, because it didn't show up on the desktop. So, after lamenting my fate for a while (first time I've experienced a HD failure), I disconnected its power but left the IDE cable attached, then rebooted to Gentoo again.

Now, I had noticed that the booting seemed to take longer the last couple of reboots (since the drive problem started, but, looking back, possibly even before the drive died -- I can't remember), but I figured it was because the system was repeatedly trying to contact the drive, which wasn't responding, or maybe was responding erratically. But now that the drive was disconnected, the boot still seemed to be very slow. What I mean is, when I first boot up the BIOS does a memory check (which comes up OK), then looks for the IDE devices (a "noname" CD-RW drive that came with the computer + Maxtor 6.4G HD on first cable, 80G Western Digital HD + dead 20G Western Digital HD on second cable), then does some stuff I can't remember, then the screen clears and it starts to check the rest of the system configuration (IRQs, etc), and then finally goes through the regular boot-drive order, checking the floppy, CD, then HD, at which point it finally brings up the GRUB menu (I now have 3 different kernels I've created over the 9 or 10 months I've been running Gentoo)... Well, I can't remember exactly what was taking a long time initially (what's happening now I'll get to below), but I think it was "hanging" for a time right before bringing up the GRUB menu. On the last successful reboot, I think I had to press Ctrl-C or something (I was just trying "random" keystrokes) and then it seemed to get over whatever the problem was and give the list of kernels. From there, no problems.

Anyway, like I said, I chalked it up to IDE communication issues and went about my merry computing way (started burning a lot of CDs before my other, even older, Maxtor drive died!). I didn't shutdown again until yesterday. My plan was to remove the old 20G HD and install a second CD-RW drive that I had from my previous (Windows) computer. Before installing the CD-RW, though, I tried rebooting just after removing the dead HD.

So now I finally get to what's happening now: the RAM checks OK, then it says "checking for IDE devices...", then there's a long pause of up to around 10 sec., then the list of drives comes up (originally the list I gave above minus the WD 20G HD, but now I've actually installed the second CD-RW and removed the Maxtor drive, as well, so it's: 2 CD-RWs on one cable, WD 80G HD only on the other), then it says "checking for IDE devices..." again, then a long pause, then the other stuff comes up rather quickly (without a second listing of the IDE devices, BTW -- I don't know why it says it's "checking" for them twice), then the screen clears and the IRQs, etc., are listed, then it tries the floppy and CD drives, and then... nothing.

It hangs there, just where it would give the list of kernels. If I don't do anything, a message eventually comes up saying, in effect, "boot failed, enter system disk and press return". Now, of course, I don't have a system disk, so....

Ah, and one more thing: I noticed on the last few shutdowns that I'm getting lots of "mtimes differ" messages, but a search of these forums reveals that that's (apparently) not really a problem.

Well, there it is. I've now booted back to Knoppix, which is how I'm sending this message.

So now my thoughts on what the problem could (or could not) be:
  • It seems extremely unlikely that it's a software issue, but my recent portage activity hasn't involved anything out of the ordinary (I just emerge -uD world and emerge -uD system every once in a while); I don't use any "weird" software (i.e., that might affect booting), anyway, being a newbie to Linux.
  • I haven't changed the kernel and nothing has been modified on /boot (the entire first partition on the 80G WD HD) since January 29th -- and I've rebooted a whopping 30 times since then.
  • Note that there doesn't seem to be any problem with the HD itself, or the IDE cable for that matter, since I can mount the drive in Knoppix, get directory listings, etc. Running fsck on the boot partition doesn't reveal any problems (comes up "clean").
  • Seems very unlikely it's a RAM problem -- again since I'm having no problems in Knoppix. (Although... come to think of it, I have had some strange glitches in the past, in Gentoo, where the screen image suddenly becomes squeezed horizontally with some of the left side of the screen duplicated on the right side -- i.e., it looks like a "tiled" image, as you would see in with a too-small background image on a webpage. As I recall, this was coming back from a blank screen after some idle time (I don't use any screensaver, so this must be the default X11 or Gnome screen blanker). Both times this happened I was able to fix the problem by logging out and logging back in again. I suppose this might reflect a RAM/video-RAM problem, but I think it's more likely associated with upgrading to xorg-x11-6.8.2-r7 a month ago. Anyway, I haven't tested my RAM yet.)
  • As for something more sinister, like a virus or trojan horse or whatever, I don't even know how to check for those, but I did come across chkrootkit in Knoppix and it didn't find anything (I really don't know what it's doing, though). My e-mail goes through clamav (on another machine) and I don't open unknown e-mail attachments or (usually) visit strange websites. And I certainly don't install any software I don't get through portage (stable branch, no binaries).

So, anyone have any ideas what the problem could be? What else I could check?

- dcljr

Edit: BTW, I've checked the Master/Slave settings on all the drives and they're correct.

Edit #2: Sadly, I am without my Gentoo Live CD; I reused the CD-RW disc it was on not one week before everything started going haywire. Fortunately, I can burn another one in Knoppix since I have 2 CD drives...


Last edited by dcljr on Fri Jun 09, 2006 8:44 pm; edited 1 time in total
Back to top
View user's profile Send private message
x22
Apprentice
Apprentice


Joined: 24 Apr 2006
Posts: 208

PostPosted: Thu Jun 08, 2006 10:04 am    Post subject: Re: drive died, booting slowed, now can't boot to Gentoo at Reply with quote

On which disk(s) are your root partition, boot partition and grub?

dcljr wrote:
I disconnected its power but left the IDE cable attached, then rebooted to Gentoo again.

This may cause problems - disconnect it from IDE too.
Back to top
View user's profile Send private message
dcljr
Tux's lil' helper
Tux's lil' helper


Joined: 20 Aug 2005
Posts: 139
Location: Austin, TX

PostPosted: Thu Jun 08, 2006 10:25 am    Post subject: Re: drive died, booting slowed, now can't boot to Gentoo at Reply with quote

Code:
# fdisk -l
Disk /dev/hda: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          63       31720+  83  Linux
/dev/hda2              64        1056      500472   82  Linux swap
/dev/hda3            1057       64549    32000472   83  Linux
/dev/hda4           64550      155061    45618048    5  Extended
/dev/hda5           64550       77249     6400768+  83  Linux
/dev/hda6           77250      119909    21500608+  83  Linux
/dev/hda7          119910      121894     1000408+  83  Linux
/dev/hda8          121895      155061    16716136+  83  Linux

/root is on hda3.
/boot and /boot/grub are on hda1 (which is 46% full).

And before you ask:
Code:
# ls -alrt /mnt/hda1
total 12864
drwx------    2 root     root        12288 Aug 17  2005 lost+found
lrwxrwxrwx    1 root     root            1 Aug 18  2005 boot -> .
-rw-r--r--    1 root     root      1589686 Aug 19  2005 kernel-genkernel-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root       777517 Aug 19  2005 System.map-genkernel-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root      1669370 Aug 19  2005 initramfs-genkernel-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root            0 Aug 30  2005 .keep
-rw-r--r--    1 root     root      1541157 Sep  3  2005 kernel-manual-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root       824807 Sep  3  2005 System.map-manual-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root        39560 Sep  3  2005 config-manual-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root      1995921 Sep  7  2005 kernel-genkernel2-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root      1121935 Sep  7  2005 System.map-genkernel2-x86-2.6.12-gentoo-r9
-rw-r--r--    1 root     root       768221 Sep  7  2005 initramfs-genkernel2-x86-2.6.12-gentoo-r9
drwxr-xr-x    2 root     root         1024 Sep  7  2005 grub
-rw-r--r--    1 root     root      1899320 Jan 29 05:51 kernel-x86-2.6.15-gentoo-r1
-rw-r--r--    1 root     root       823286 Jan 29 05:52 System.map-x86-2.6.15-gentoo-r1
-rw-r--r--    1 root     root        30901 Jan 29 05:53 config-x86-2.6.15-gentoo-r1
drwxr-xr-x    4 root     root         1024 Jan 29 05:53 .
drwxr-xr-x   12 root     root         1024 Jun  8 04:52 ..

Code:
ls -alrt /mnt/hda1/boot
total 443
lrwxrwxrwx    1 root     root            9 Aug 20  2005 menu.lst -> grub.conf
-rw-r--r--    1 root     root           30 Aug 20  2005 device.map
-rw-r--r--    1 root     root          197 Aug 20  2005 default
-rw-r--r--    1 root     root       108296 Aug 20  2005 stage2.old
-rw-r--r--    1 root     root         9256 Aug 30  2005 xfs_stage1_5
-rw-r--r--    1 root     root         6432 Aug 30  2005 vstafs_stage1_5
-rw-r--r--    1 root     root         7156 Aug 30  2005 ufs2_stage1_5
-rw-r--r--    1 root     root       108296 Aug 30  2005 stage2_eltorito
-rw-r--r--    1 root     root       108296 Aug 30  2005 stage2
-rw-r--r--    1 root     root          512 Aug 30  2005 stage1
-rw-r--r--    1 root     root        33856 Aug 30  2005 splash.xpm.gz
-rw-r--r--    1 root     root         9216 Aug 30  2005 reiserfs_stage1_5
-rw-r--r--    1 root     root         7008 Aug 30  2005 minix_stage1_5
-rw-r--r--    1 root     root         8320 Aug 30  2005 jfs_stage1_5
-rw-r--r--    1 root     root         6816 Aug 30  2005 iso9660_stage1_5
-rw-r--r--    1 root     root         1624 Aug 30  2005 grub.conf.sample
-rw-r--r--    1 root     root         6816 Aug 30  2005 ffs_stage1_5
-rw-r--r--    1 root     root         7504 Aug 30  2005 fat_stage1_5
-rw-r--r--    1 root     root         7776 Aug 30  2005 e2fs_stage1_5
drwxr-xr-x    2 root     root         1024 Sep  7  2005 .
drwxr-xr-x    4 root     root         1024 Jan 29 05:53 ..
-rw-r--r--    1 root     root          800 Jan 29 06:13 grub.conf


x22 wrote:
This may cause problems - disconnect it from IDE too.

Like what kind of problems? I disconnected it once I found I could no longer boot.

- dcljr

Edit: Changed ls -lrt to ls -alrt.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54831
Location: 56N 3W

PostPosted: Thu Jun 08, 2006 11:23 am    Post subject: Reply with quote

dcljr,

The clicking noise from the faulty drive is the drive recalibrating (moving the heads to track 0) after a failed seek.
Its usually a bad sign. To learn more about the drive status,
Code:
emerge smartmontools
and read the drives internal SMART log.
If you want to attempt to recover the data, try dd_rhelp, which is not in portage.

Adding/removing drives can make a mess of Grubs drive numbering but since your install is on /dev/hda, or grubs (hd0, you should be OK.
Leaving the IDE cable connected but the drive powered down will encorage the kernel to try harder to communicate with the drive before giving up. It will extend to boot time and may increase the error rate to the powered drive on the same IDE cable but is otherwise harmless.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
dcljr
Tux's lil' helper
Tux's lil' helper


Joined: 20 Aug 2005
Posts: 139
Location: Austin, TX

PostPosted: Fri Jun 09, 2006 8:44 pm    Post subject: Reply with quote

Well, thanks for the responses, even if they didn't actually help me solve the problem....

Turns out I just needed to hook the Maxtor HD back up since it would take the place on IDE cable where the old 20G WD HD used to be. I thought I'd booted at least once with that combination, but I guess not.

IOW, even though the 80G WD HD was still at the exact same location on the IDE cable, and was still showing up as hda to Knoppix -- and thus, presumably, (hd0,0) to GRUB -- it still was causing trouble. Actually, I don't think the problem had anything to do with GRUB or Linux at all, since the symptoms started (during the boot process) way before either of those are invoked. In fact, it was when I first rebooted with the Maxtor HD reconnected that I noticed that the boot process was blazingly fast again! "Hey! Wait a minute..."

So I turned off the computer before Knoppix loaded and tried booting to Gentoo again. The only problem that happened now was the attempt to check the Maxtor drive failed since it was now hdb instead of hdd. Fortunately, the boot process paused at that point and allowed me to abort to a command line where I fixed /etc/fstab and rebooted. And everything worked as it had berfore. Damn! ... Ain't computers fun?

- dcljr
Back to top
View user's profile Send private message
dcljr
Tux's lil' helper
Tux's lil' helper


Joined: 20 Aug 2005
Posts: 139
Location: Austin, TX

PostPosted: Fri Jun 09, 2006 8:47 pm    Post subject: Reply with quote

Of course, this doesn't bode well for that day when the Maxtor drive actually does fail and I have to remove it again. I'm going to have to figure out what I was doing wrong!

- dcljr
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum