View previous topic :: View next topic |
Author |
Message |
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Sat Jun 05, 2021 10:46 pm Post subject: [SOLVED] fails to reboot, but will startup |
|
|
Hi,
I have a strange issue where my machine won't reboot with the 'reboot' command, but if I shutdown (either with 'shutdown -h now' or a hard stop with the power button), and then start it up, it starts fine. The bootloader is lilo, and the failure occurs between the lilo boot screen while it says it's loading and the first steps of the actual bootup process. This issue came up after a recent kernel update, which is now 5.12.7 from gentoo-sources.
I'd be grateful for any thoughts.
Last edited by jyoung on Thu Jun 17, 2021 4:19 pm; edited 1 time in total |
|
Back to top |
|
|
alamahant Advocate
Joined: 23 Mar 2019 Posts: 3949
|
Posted: Sun Jun 06, 2021 12:49 pm Post subject: |
|
|
I think its simple
1.Good kernel
2.Good initrd
3.Grub
Why use lilo?who uses lilo nowadays?
Only slackware no?
Also plz plz dont do hard reboots.
A lot more info is needed to troubleshoot this.
It could be a combination of many things.
dmesg
is your friend.
Also a pic of the failing boot maybe?
A patebin of your kernel .config also?
Do you use an initrd?
How is your partition layout?
etc _________________
|
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Sun Jun 06, 2021 4:54 pm Post subject: |
|
|
Okay, here is my kernel config
http://www.pastebin.com/1G5d0NEi
And dmesg
http://www.pastebin.com/m40DqKAK
My partition layout is pretty simple, here's part of /etc/fstab:
Code: | /dev/sda1 /boot ext4 noauto,noatime 1 2
/dev/sda2 none swap sw 0 0
/dev/sda3 / ext4 noatime 0 1
/dev/sda5 /home ext4 noatime 0 0 |
Agreed, hard stops are to be avoided, but in this case when the machine is hang before bootup there's no option. I'm not using an initrd. I'm using lilo because when I setup this machine five years ago I knew that I wanted something simple and I hadn't tried lilo yet, and while it was old even at that point the gentoo handbook still described it as reliable. Honestly, I haven't had problems with it since, although it's always possible that this could be the first instance. Here's my lilo config:
Code: | boot=/dev/sda
prompt
timeout=60
default=gentoo-linux
image=/boot/vmlinuz-5.12.7-gentoo
label=gentoo-linux
read-only
root=/dev/sda3
append="acpi_osi=Linux"
image=/boot/vmlinuz-5.6.11-gentoo
label=gentoo-backup
read-only
root=/dev/sda3
append="acpi_osi=Linux"
|
|
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
Posted: Sun Jun 06, 2021 5:23 pm Post subject: |
|
|
This is certainly affecting kernel interaction with BIOS, have you tried without it? And don't get carried away, this has nothing to do with your booloader choice. Booloader works only for a fraction of second during boot, bootloader has nothing to do with reboot or shutdown. _________________ My Gentoo installation notes.
Please learn how to denote units correctly! |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Sun Jun 06, 2021 8:45 pm Post subject: |
|
|
Hmm, I can't recall why I added acpi_osi=Linux to lilo.conf. I was five years ago that I set it up... However, this post suggests that it's a reasonable thing to do:
https://askubuntu.com/questions/28848/what-does-the-kernel-boot-parameter-set-acpi-osi-linux-do
Still, no harm in trying without. I setup this new lilo.conf file
Code: | boot=/dev/sda
prompt
timeout=60
default=gentoo-linux
image=/boot/vmlinuz-5.12.7-gentoo
label=gentoo-linux
read-only
root=/dev/sda3
append="acpi_osi=Linux"
image=/boot/vmlinuz-5.12.7-gentoo
label=gentoo-test
read-only
root=/dev/sda3
image=/boot/vmlinuz-5.6.11-gentoo
label=gentoo-backup
read-only
root=/dev/sda3
append="acpi_osi=Linux" |
Both 'gentoo-linux' and 'gentoo-test' seem to behave the same, that is, they startup normally except after a 'reboot' command.
Some more observations: When the failure occurs, the last thing I see is a message from the lilo screen saying "BIOS data check successful". This message is printed regardless of whether or not the startup is about to fail. In the failure case, it then goes to a blank screen. I also know that this is not just a display issue, and the machine is actually not starting up, since I can't login remotely. |
|
Back to top |
|
|
Logicien Veteran
Joined: 16 Sep 2005 Posts: 1555 Location: Montréal
|
Posted: Mon Jun 07, 2021 11:58 am Post subject: |
|
|
When reboot the system softwares are reinitialised but not the microcode of the devices as it is done with a poweroff. I had a problem like this with an Intel Apu where the reboot was not finished using the kernel module i915. After blacklist i915 Linux started to use the efifb framebuffer and than reboot finished positively.
So this can have to do with the graphic card if you use an Intel integrated video card. Using an Ati/Amd Pcie video card with the Linux radeon module reboot is fine. One device may not be reinitialised correctly on reboot when stuck on the boot process. _________________ Paul |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
Posted: Tue Jun 08, 2021 8:45 pm Post subject: |
|
|
jyoung,
the askubuntu article you linked to also tells "Yes, BIOS's usually disable functionality if Windows is not detected", which means if you tell such a braindead BIOS you are running Linux it may misbehave. Maybe lying to it you are running windows makes it listen to the OS? My 2¢. _________________ My Gentoo installation notes.
Please learn how to denote units correctly! |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Tue Jun 08, 2021 9:44 pm Post subject: |
|
|
Alas, no luck. Here's my lilo.conf file:
Code: | boot=/dev/sda
prompt
timeout=60
default=gentoo-linux
image=/boot/vmlinuz-5.12.7-gentoo
label=gentoo-linux
read-only
root=/dev/sda3
append="acpi_osi=Linux"
image=/boot/vmlinuz-5.12.7-gentoo
label=gentoo-test
read-only
root=/dev/sda3
append="acpi_osi=Windows"
image=/boot/vmlinuz-5.6.11-gentoo
label=gentoo-backup
read-only
root=/dev/sda3
append="acpi_osi=Linux" |
Even with append="acpi_osi=Windows" under gentoo-test, the problem persists. |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Wed Jun 09, 2021 12:34 am Post subject: |
|
|
The intel microcode gentoo wiki also states
Quote: | If the initramfs USE flag is active the intel-microcode ebuild will automatically install a cpio archive of all microcode into /boot/intel-uc.img. |
With equery uses intel-microcode I get
Code: | - - hostonly : only install ucode(s) supported by currently available (=online) processor(s)
- - initramfs : install a small initramfs for use with CONFIG_MICROCODE_EARLY
+ + split-ucode : install the split binary ucode files (used by the kernel directly)
- - vanilla : install only microcode updates from Intel's official microcode tarball |
So, maybe I should focus on early microcode loading? But, there's no CONFIG_MICROCODE_EARLY in the .config file, and in menuconfig I can't find any reference to MICROCODE_EARLY with the '/' search. |
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23070
|
Posted: Wed Jun 09, 2021 1:45 am Post subject: |
|
|
It appears that MICROCODE_EARLY was removed in fe055896c040df571e4ff56fb196d6845130057b in 2015. However, as Jaglover says, the functionality still exists. There is just no symbol for excluding it, because early microcode was deemed to be the better approach than supporting late microcode. |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Wed Jun 09, 2021 1:50 am Post subject: |
|
|
Okay, so the microcode is in the kernel and loaded early... but even on a reboot? Logicien, you suggested that the microcode my not be reinitialized on a reboot, and that seems to fit the symptoms here. |
|
Back to top |
|
|
Logicien Veteran
Joined: 16 Sep 2005 Posts: 1555 Location: Montréal
|
Posted: Wed Jun 09, 2021 5:59 am Post subject: |
|
|
With i915 the backlight is always on and I have not found a way to disable it after try I everything I could. In plus the reboot is slow with it. On a Dell Optiplex 7100 the Dell Efi/Bios logo is not reappearing and the computer stay in an idle state and the screen go to save power mode. Replacing i915 by efifb resolv all problems, backlight is off and reboot is good. But, efifb is not performing in FPS as i915.
Now I use an Amd/Ati Pcie extension card and the radeon module work well. But the integrated Intel Apu is performing the best in terms of Frames Per Second (FPS). Anyway I think that the cold poweroff is better than a reboot in terms of testing an upgrade for example.
If you use Grub2 and it display properly you can try to pass to it in the /etc/default/grub file the parameter GRUB_GFXPAYLOAD_LINUX=keep and see if Linux use it and boot properly too. Or, use the Linux kernel parameter video= to tell to Linux which resolution use. _________________ Paul |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Fri Jun 11, 2021 3:49 am Post subject: |
|
|
Okay, I can setup grub and try out the GRUB_GFXPAYLOAD_LINUX=keep option. lilo is nice and simple, but perhaps we've hit its limits. I should be able to report back on that sometime tomorrow.
With the video= kernel option, would that be the resolution of the monitor? I have to admit that it seems kind of weird to need to put the monitor resolution into the bootloader, but it's easy enough to try.
Agreed, it does seem that a cold restart is preferable! But it would be great to get the reboot ability working. It's sometimes necessary for me to reboot this machine remotely. |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Fri Jun 11, 2021 6:32 pm Post subject: |
|
|
This afternoon I switched to grub2 and added GRUB_GFXPAYLOAD_LINUX=keep to /etc/default/grub. Booting through grub works as normal, but rebooting through grub hangs right after the grub menu, when it prints 'Loading Linux 5.12.7-gentoo ...'. It seems like the issue is the same as with lilo. |
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
Posted: Fri Jun 11, 2021 6:51 pm Post subject: |
|
|
I think I have a minor version of this bug. When I reboot then my 2560x1440 display is not detected and comes up 1920x1080, furthermore, 2560x1440 resolution is not available in X, either. I haven't worked on this as I reboot very seldom.
Have you played with EDID loading option in kernel?
Edit. Have to retract, having a closer look at kernel options I do not see anything what would affect my Intel HD 630 reboot. _________________ My Gentoo installation notes.
Please learn how to denote units correctly! |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Sun Jun 13, 2021 12:21 am Post subject: |
|
|
When I reboot off the old kernel (5.6.11), the bug does not occur. So either something was messed up in the migration, or there's a bug in the new (5.12.7) source. Or, there was something messed up in the 5.6.11 .config file that, by pure luck, remained asymptomatic until the migration.
When I migrated from 5.6.11 to 5.12.7, I used make oldefconfig. I'm going to try rebuilding 5.12.7 from scratch, and see that works any better. |
|
Back to top |
|
|
Jaglover Watchman
Joined: 29 May 2005 Posts: 8291 Location: Saint Amant, Acadiana
|
Posted: Sun Jun 13, 2021 1:18 am Post subject: |
|
|
I can't imagine a case where I would use olddefconfig, it overwrites (modifies) kernel configuration without even notifying what was done. I certainly do not want such disaster to my kernels, considering how many default options I have to change every time I run oldconfig. _________________ My Gentoo installation notes.
Please learn how to denote units correctly! |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sun Jun 13, 2021 2:35 pm Post subject: |
|
|
I agree with Jaglover. I recently posted a buildscript in "Tips and Tricks". It uses "make oldconfig" not make olddefconfig.
You should have a /boot/config-<somerthing or other> from your working kernel. Just eselect the new kernel and then pass the location of that config as a parameter to that script.
Better yet, boot the working kernel, then eselect the new kernel and just run,. This assumes that you have the config built into the kernel.
See https://www.xaprb.com/blog/2006/05/23/how-to-use-linuxs-proc-config-feature/
If not just run it passing the config I referenced above.
Last edited by Tony0945 on Sun Jun 13, 2021 5:15 pm; edited 1 time in total |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Sun Jun 13, 2021 4:36 pm Post subject: |
|
|
Indeed, it appears that this thread will read as cautionary tale for those who might opt for make olddefconfig. I just setup a new .config file from scratch, compiled and installed the kernel, and I was able to reboot without issue.
Tony0945, today or tomorrow I'm going to try some of the tips you suggested to make a clean migration from the old kernel to the new one. I'll report back, but I it certainly looks like we're close to solving this issue. |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Thu Jun 17, 2021 4:19 pm Post subject: |
|
|
Okay, I'm marking this thread as 'solved'. The root of the problem was with one of the default options pulled in by 'make olddefconfig'. Thanks a lot to everyone for trouble shooting this with me! |
|
Back to top |
|
|
GDH-gentoo Veteran
Joined: 20 Jul 2019 Posts: 1802 Location: South America
|
Posted: Thu Jun 17, 2021 6:28 pm Post subject: |
|
|
jyoung wrote: | The root of the problem was with one of the default options pulled in by 'make olddefconfig'. |
For the benefit of future readers, why don't you tell us which option was that and what was the setting that fixed the problem? |
|
Back to top |
|
|
jyoung Guru
Joined: 20 Mar 2007 Posts: 471
|
Posted: Mon Jul 26, 2021 12:37 am Post subject: |
|
|
That's a good point GDH-gentoo. I just ran diff on the two config files, and the differences are quite numerous. I'd be happy to post the entire list, but I wonder if there's a good way to determine the key differences. |
|
Back to top |
|
|
|