View previous topic :: View next topic |
Author |
Message |
dcljr Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
![](images/avatars/gallery/Southpark/avatar13.gif)
Joined: 20 Aug 2005 Posts: 139 Location: Austin, TX
|
Posted: Thu Jun 08, 2006 9:43 am Post subject: drive died, booting slowed, now can't boot to Gentoo[SOLVED] |
|
|
I have a sad story to tell. Bear with me since I don't know how much of this is actually relevant to my problem.
About two weeks ago, one of my hard drives (a 20G Western Digital, c. 6-7 years old) appeared to die -- it started making clicking sounds, as if it was repeatedly shutting down and immediately reviving. I more or less panicked and shut down my system (gracefully). When I rebooted, the drive continued to make weird noises, so I shut down again (this time, I think I just hit the power button mid-boot). I booted to a Knoppix Live CD and saw for the first time which drive was actually having trouble, because it didn't show up on the desktop. So, after lamenting my fate for a while (first time I've experienced a HD failure), I disconnected its power but left the IDE cable attached, then rebooted to Gentoo again.
Now, I had noticed that the booting seemed to take longer the last couple of reboots (since the drive problem started, but, looking back, possibly even before the drive died -- I can't remember), but I figured it was because the system was repeatedly trying to contact the drive, which wasn't responding, or maybe was responding erratically. But now that the drive was disconnected, the boot still seemed to be very slow. What I mean is, when I first boot up the BIOS does a memory check (which comes up OK), then looks for the IDE devices (a "noname" CD-RW drive that came with the computer + Maxtor 6.4G HD on first cable, 80G Western Digital HD + dead 20G Western Digital HD on second cable), then does some stuff I can't remember, then the screen clears and it starts to check the rest of the system configuration (IRQs, etc), and then finally goes through the regular boot-drive order, checking the floppy, CD, then HD, at which point it finally brings up the GRUB menu (I now have 3 different kernels I've created over the 9 or 10 months I've been running Gentoo)... Well, I can't remember exactly what was taking a long time initially (what's happening now I'll get to below), but I think it was "hanging" for a time right before bringing up the GRUB menu. On the last successful reboot, I think I had to press Ctrl-C or something (I was just trying "random" keystrokes) and then it seemed to get over whatever the problem was and give the list of kernels. From there, no problems.
Anyway, like I said, I chalked it up to IDE communication issues and went about my merry computing way (started burning a lot of CDs before my other, even older, Maxtor drive died!). I didn't shutdown again until yesterday. My plan was to remove the old 20G HD and install a second CD-RW drive that I had from my previous (Windows) computer. Before installing the CD-RW, though, I tried rebooting just after removing the dead HD.
So now I finally get to what's happening now: the RAM checks OK, then it says "checking for IDE devices...", then there's a long pause of up to around 10 sec., then the list of drives comes up (originally the list I gave above minus the WD 20G HD, but now I've actually installed the second CD-RW and removed the Maxtor drive, as well, so it's: 2 CD-RWs on one cable, WD 80G HD only on the other), then it says "checking for IDE devices..." again, then a long pause, then the other stuff comes up rather quickly (without a second listing of the IDE devices, BTW -- I don't know why it says it's "checking" for them twice), then the screen clears and the IRQs, etc., are listed, then it tries the floppy and CD drives, and then... nothing.
It hangs there, just where it would give the list of kernels. If I don't do anything, a message eventually comes up saying, in effect, "boot failed, enter system disk and press return". Now, of course, I don't have a system disk, so....
Ah, and one more thing: I noticed on the last few shutdowns that I'm getting lots of "mtimes differ" messages, but a search of these forums reveals that that's (apparently) not really a problem.
Well, there it is. I've now booted back to Knoppix, which is how I'm sending this message.
So now my thoughts on what the problem could (or could not) be:
- It seems extremely unlikely that it's a software issue, but my recent portage activity hasn't involved anything out of the ordinary (I just emerge -uD world and emerge -uD system every once in a while); I don't use any "weird" software (i.e., that might affect booting), anyway, being a newbie to Linux.
- I haven't changed the kernel and nothing has been modified on /boot (the entire first partition on the 80G WD HD) since January 29th -- and I've rebooted a whopping 30 times since then.
- Note that there doesn't seem to be any problem with the HD itself, or the IDE cable for that matter, since I can mount the drive in Knoppix, get directory listings, etc. Running fsck on the boot partition doesn't reveal any problems (comes up "clean").
- Seems very unlikely it's a RAM problem -- again since I'm having no problems in Knoppix. (Although... come to think of it, I have had some strange glitches in the past, in Gentoo, where the screen image suddenly becomes squeezed horizontally with some of the left side of the screen duplicated on the right side -- i.e., it looks like a "tiled" image, as you would see in with a too-small background image on a webpage. As I recall, this was coming back from a blank screen after some idle time (I don't use any screensaver, so this must be the default X11 or Gnome screen blanker). Both times this happened I was able to fix the problem by logging out and logging back in again. I suppose this might reflect a RAM/video-RAM problem, but I think it's more likely associated with upgrading to xorg-x11-6.8.2-r7 a month ago. Anyway, I haven't tested my RAM yet.)
- As for something more sinister, like a virus or trojan horse or whatever, I don't even know how to check for those, but I did come across chkrootkit in Knoppix and it didn't find anything (I really don't know what it's doing, though). My e-mail goes through clamav (on another machine) and I don't open unknown e-mail attachments or (usually) visit strange websites. And I certainly don't install any software I don't get through portage (stable branch, no binaries).
So, anyone have any ideas what the problem could be? What else I could check?
- dcljr
Edit: BTW, I've checked the Master/Slave settings on all the drives and they're correct.
Edit #2: Sadly, I am without my Gentoo Live CD; I reused the CD-RW disc it was on not one week before everything started going haywire. Fortunately, I can burn another one in Knoppix since I have 2 CD drives...
Last edited by dcljr on Fri Jun 09, 2006 8:44 pm; edited 1 time in total |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
x22 Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
Joined: 24 Apr 2006 Posts: 208
|
Posted: Thu Jun 08, 2006 10:04 am Post subject: Re: drive died, booting slowed, now can't boot to Gentoo at |
|
|
On which disk(s) are your root partition, boot partition and grub?
dcljr wrote: | I disconnected its power but left the IDE cable attached, then rebooted to Gentoo again.
|
This may cause problems - disconnect it from IDE too. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
dcljr Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
![](images/avatars/gallery/Southpark/avatar13.gif)
Joined: 20 Aug 2005 Posts: 139 Location: Austin, TX
|
Posted: Thu Jun 08, 2006 10:25 am Post subject: Re: drive died, booting slowed, now can't boot to Gentoo at |
|
|
Code: | # fdisk -l
Disk /dev/hda: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 63 31720+ 83 Linux
/dev/hda2 64 1056 500472 82 Linux swap
/dev/hda3 1057 64549 32000472 83 Linux
/dev/hda4 64550 155061 45618048 5 Extended
/dev/hda5 64550 77249 6400768+ 83 Linux
/dev/hda6 77250 119909 21500608+ 83 Linux
/dev/hda7 119910 121894 1000408+ 83 Linux
/dev/hda8 121895 155061 16716136+ 83 Linux
|
/root is on hda3.
/boot and /boot/grub are on hda1 (which is 46% full).
And before you ask:
Code: | # ls -alrt /mnt/hda1
total 12864
drwx------ 2 root root 12288 Aug 17 2005 lost+found
lrwxrwxrwx 1 root root 1 Aug 18 2005 boot -> .
-rw-r--r-- 1 root root 1589686 Aug 19 2005 kernel-genkernel-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 777517 Aug 19 2005 System.map-genkernel-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 1669370 Aug 19 2005 initramfs-genkernel-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 0 Aug 30 2005 .keep
-rw-r--r-- 1 root root 1541157 Sep 3 2005 kernel-manual-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 824807 Sep 3 2005 System.map-manual-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 39560 Sep 3 2005 config-manual-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 1995921 Sep 7 2005 kernel-genkernel2-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 1121935 Sep 7 2005 System.map-genkernel2-x86-2.6.12-gentoo-r9
-rw-r--r-- 1 root root 768221 Sep 7 2005 initramfs-genkernel2-x86-2.6.12-gentoo-r9
drwxr-xr-x 2 root root 1024 Sep 7 2005 grub
-rw-r--r-- 1 root root 1899320 Jan 29 05:51 kernel-x86-2.6.15-gentoo-r1
-rw-r--r-- 1 root root 823286 Jan 29 05:52 System.map-x86-2.6.15-gentoo-r1
-rw-r--r-- 1 root root 30901 Jan 29 05:53 config-x86-2.6.15-gentoo-r1
drwxr-xr-x 4 root root 1024 Jan 29 05:53 .
drwxr-xr-x 12 root root 1024 Jun 8 04:52 ..
|
Code: | ls -alrt /mnt/hda1/boot
total 443
lrwxrwxrwx 1 root root 9 Aug 20 2005 menu.lst -> grub.conf
-rw-r--r-- 1 root root 30 Aug 20 2005 device.map
-rw-r--r-- 1 root root 197 Aug 20 2005 default
-rw-r--r-- 1 root root 108296 Aug 20 2005 stage2.old
-rw-r--r-- 1 root root 9256 Aug 30 2005 xfs_stage1_5
-rw-r--r-- 1 root root 6432 Aug 30 2005 vstafs_stage1_5
-rw-r--r-- 1 root root 7156 Aug 30 2005 ufs2_stage1_5
-rw-r--r-- 1 root root 108296 Aug 30 2005 stage2_eltorito
-rw-r--r-- 1 root root 108296 Aug 30 2005 stage2
-rw-r--r-- 1 root root 512 Aug 30 2005 stage1
-rw-r--r-- 1 root root 33856 Aug 30 2005 splash.xpm.gz
-rw-r--r-- 1 root root 9216 Aug 30 2005 reiserfs_stage1_5
-rw-r--r-- 1 root root 7008 Aug 30 2005 minix_stage1_5
-rw-r--r-- 1 root root 8320 Aug 30 2005 jfs_stage1_5
-rw-r--r-- 1 root root 6816 Aug 30 2005 iso9660_stage1_5
-rw-r--r-- 1 root root 1624 Aug 30 2005 grub.conf.sample
-rw-r--r-- 1 root root 6816 Aug 30 2005 ffs_stage1_5
-rw-r--r-- 1 root root 7504 Aug 30 2005 fat_stage1_5
-rw-r--r-- 1 root root 7776 Aug 30 2005 e2fs_stage1_5
drwxr-xr-x 2 root root 1024 Sep 7 2005 .
drwxr-xr-x 4 root root 1024 Jan 29 05:53 ..
-rw-r--r-- 1 root root 800 Jan 29 06:13 grub.conf
|
x22 wrote: | This may cause problems - disconnect it from IDE too. |
Like what kind of problems? I disconnected it once I found I could no longer boot.
- dcljr
Edit: Changed ls -lrt to ls -alrt. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
NeddySeagoon Administrator
![Administrator Administrator](/images/ranks/rank-admin.gif)
![](images/avatars/3946266373f47d606a2db3.jpg)
Joined: 05 Jul 2003 Posts: 54831 Location: 56N 3W
|
Posted: Thu Jun 08, 2006 11:23 am Post subject: |
|
|
dcljr,
The clicking noise from the faulty drive is the drive recalibrating (moving the heads to track 0) after a failed seek.
Its usually a bad sign. To learn more about the drive status, Code: | emerge smartmontools | and read the drives internal SMART log.
If you want to attempt to recover the data, try dd_rhelp, which is not in portage.
Adding/removing drives can make a mess of Grubs drive numbering but since your install is on /dev/hda, or grubs (hd0, you should be OK.
Leaving the IDE cable connected but the drive powered down will encorage the kernel to try harder to communicate with the drive before giving up. It will extend to boot time and may increase the error rate to the powered drive on the same IDE cable but is otherwise harmless. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
dcljr Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
![](images/avatars/gallery/Southpark/avatar13.gif)
Joined: 20 Aug 2005 Posts: 139 Location: Austin, TX
|
Posted: Fri Jun 09, 2006 8:44 pm Post subject: |
|
|
Well, thanks for the responses, even if they didn't actually help me solve the problem....
Turns out I just needed to hook the Maxtor HD back up since it would take the place on IDE cable where the old 20G WD HD used to be. I thought I'd booted at least once with that combination, but I guess not.
IOW, even though the 80G WD HD was still at the exact same location on the IDE cable, and was still showing up as hda to Knoppix -- and thus, presumably, (hd0,0) to GRUB -- it still was causing trouble. Actually, I don't think the problem had anything to do with GRUB or Linux at all, since the symptoms started (during the boot process) way before either of those are invoked. In fact, it was when I first rebooted with the Maxtor HD reconnected that I noticed that the boot process was blazingly fast again! "Hey! Wait a minute..."
So I turned off the computer before Knoppix loaded and tried booting to Gentoo again. The only problem that happened now was the attempt to check the Maxtor drive failed since it was now hdb instead of hdd. Fortunately, the boot process paused at that point and allowed me to abort to a command line where I fixed /etc/fstab and rebooted. And everything worked as it had berfore. Damn! ... Ain't computers fun?
- dcljr |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
dcljr Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
![](images/avatars/gallery/Southpark/avatar13.gif)
Joined: 20 Aug 2005 Posts: 139 Location: Austin, TX
|
Posted: Fri Jun 09, 2006 8:47 pm Post subject: |
|
|
Of course, this doesn't bode well for that day when the Maxtor drive actually does fail and I have to remove it again. I'm going to have to figure out what I was doing wrong!
- dcljr |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|