Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel Crash Very Early in Boot Sequence (GPD MicroPC)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Installing Gentoo
View previous topic :: View next topic  
Author Message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Thu Jul 11, 2024 12:25 pm    Post subject: Kernel Crash Very Early in Boot Sequence (GPD MicroPC) Reply with quote

Context

I've been trying to create a very small (~16GB) console only Gentoo installation on a GPD MicroPC in order to have rapid access to certain tools. I've been using the usual Gentoo documentation plus this page as a guide. The trouble is, it won't boot. The kernel doesn't even draw to the screen. In spite of the title of this topic, I don't even know that the handover to the kernel is taking place correctly, and I certainly don't know how to debug it, hence why I'm humbly asking for a little assistance.

What I have done

I have successfully installed the Gentoo system to a newly created ext4 partition using a live USB. I can arch-chroot into it and it appears to work just fine. I have also used efibootmgr to create the appropriate boot entry and set it as the first active one. I created the kernel by emerging sys-kernel/gentoo-sources (6.6.38), downloading the 5.2.1 kernel config from the handheld.computer page I linked above, and then running make menuconfig, and verifying everything looked sane (to my untrained eye anyway).

This might be the cause of the problem if make menuconfig didn't gracefully handle the jump from 5.2.1 to 6.6.38, or if the manufacturer changed some of the hardware during the intervening almost five years without making it obvious, but even then I would have expected the kernel to be able to write something to the screen using legacy BIOS operations (please correct me if I'm wrong).

[UPDATE] I have updated the config using the correct procedure of executing make oldconfig and answering the questions, the issue persists.

Things I know

I can confirm that CONFIG_FB_EFI=y is set in the kernel config. The kernel's file name also has the correct .efi extension.

I can be confident that the UEFI firmware can see the kernel in the ESP's filesystem as earlier during installation I accidentally put the kernel in the wrong directory (oops) and the boot failed but in a different way.

There is no initrd or initramfs in use.

The Symptoms

If I try to boot the kernel the system will just hang indefinitely. I have tried booting the kernel by:

  • Turning the machine on and waiting.
  • Going into the BIOS menu and selecting the appropriate boot entry.
  • Manually booting the kernel via GRUB2 on the live USB.
  • Going into the "built-in EFI shell", navigating to the kernel through the ESP's filesystem, and instructing to execute.

In all four cases the state of the screen prior to booting is different, and in all four cases whatever was previously on the screen persists indefinitely. Whenever this happens journalctl --list-boots will fail to see that any boot attempt was made at all, suggesting that the kernel really is failing early, and its not just a screen drawing problem (although as I mentioned above the kernel really should be able to draw to the screen using legacy BIOS operations).

Then again, I can only run journalctl when arch-chrooted into the Gentoo system from the live USB, so I don't know whether this test is valid or not. This may just be telling me when the live USB was booted.

[UPDATE] I now think that my boots attempts were showing up in journalctl, and not the live USB boot attempts, and that I just confused myself.

Other information

The built-in kernel command string is currently set to "root=/dev/sda2 fbcon=rotate:1 i915.lvds_downclock=1 i915.i915_enable_fb=1 i915.i915_enable_rc6=1 i915.enable_psr=1".

Running fdisk -l /dev/sda yields:

Code:

Disk /dev/sda: 238.47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: BIWIN CNF42V51M0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: (I assume I don't want to share this?)

Device      Start      End  Sectors  Size Type
/dev/sda1    2048   206847   204800  100M EFI System
/dev/sda2  206848 33554432 33347585 15.9G Linux filesystem


I doubt this matters, but this is a systemd installation, and I have updated the kernel config to reflect this.


Last edited by tempuser on Wed Jul 17, 2024 3:32 pm; edited 3 times in total
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Thu Jul 11, 2024 2:31 pm    Post subject: Re: Kernel Crash Very Early in Boot Sequence (GPD MicroPC) Reply with quote

tempuser wrote:
[...] downloading the 5.2.1 kernel config from the handheld.computer page I linked above, and then running make menuconfig, and verifying everything looked sane (to my untrained eye anyway).

The correct procedure is:
- Copying the old .config into the new /usr/src/... AND THEN
- "make oldconfig" and answering all questions.

tempuser wrote:
[..] I can confirm that CONFIG_FB_EFI=y is set in the kernel config. [...]

Maybe you are missing framebuffer console and/or "Enable legacy fbdev support for your modesetting driver" ... see here:
https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_Configuring_Kernel_Version_6.6#Part_3_-_Must_Haves
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Thu Jul 11, 2024 4:00 pm    Post subject: Reply with quote

Thank you for the feedback pietinger. Sadly, no luck. I checked, and all the framebuffer options that should be enabled are. Recreating the kernel config from the 5.2.1 config using make oldconfig, answering the myriad questions (goodness the Linux kernel team have been busy these last five years), then adjusting with make menuconfig did not help either (after recompiling and installing the new kernel, naturally).
Back to top
View user's profile Send private message
kimchi_sg
Advocate
Advocate


Joined: 26 Nov 2004
Posts: 3010

PostPosted: Thu Jul 11, 2024 4:45 pm    Post subject: Reply with quote

tempuser wrote:
Thank you for the feedback pietinger. Sadly, no luck. I checked, and all the framebuffer options that should be enabled are. Recreating the kernel config from the 5.2.1 config using make oldconfig, answering the myriad questions (goodness the Linux kernel team have been busy these last five years), then adjusting with make menuconfig did not help either (after recompiling and installing the new kernel, naturally).

Hmm... why not post your config and lspci -nnk for us to double check?

You will need to use wgetpaste: https://wiki.gentoo.org/wiki/Wgetpaste
Back to top
View user's profile Send private message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Thu Jul 11, 2024 5:14 pm    Post subject: Reply with quote

Of course, here you go:

Fingers crossed you can spot whatever blunder I've made.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Thu Jul 11, 2024 8:37 pm    Post subject: Reply with quote

You really have a very clean kernel configuration ... a monolithic kernel without initramfs (like me):
Code:
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_EXPERT=y
CONFIG_X86_INTEL_LPSS=y
# CONFIG_MODULES is not set

The only option I miss is:
Code:
# CONFIG_EFI_HANDOVER_PROTOCOL is not set

I am unsure if this is the reason of your problem. If the problem persists, I would like to check two things (in this order):

1. The UEFI boot entry (we had some buggy UEFI): Boot again with our GentooCD and check the output of "efibootmgr". Does it point to your kernel ?

2. Maybe a problem with the kernel command line parameters: Remove all, except "root=... ro" ... Does it boot now ?

If all of this does not help ... we have some special kernel command line parameters:
https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Kernel_Commandline_Parameter#Parameter:_earlycon.3Defifb_and_others
Use ALL: "earlycon=efifb efi=debug initcall_debug ignore_loglevel keep_bootcon"

BTW:

1. I dont know your machine and which touchpad it has ... Maybe you will need later some more options for it (e.g.# CONFIG_I2C_HID is not set; # CONFIG_HID_MULTITOUCH is not set; but there is a description in my wiki article for that)

2. I really would disable this: CONFIG_PSTORE=y (do you know what it does?; you could kill your UEFI)

3. Does it have USB3 ? Because you have CONFIG_TYPEC=y ... but you miss: # CONFIG_HOTPLUG_PCI is not set (thunderbolt and USB3 need PCI hotplugging)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Tue Jul 16, 2024 2:35 pm    Post subject: Reply with quote

Quote:
You really have a very clean kernel configuration ... a monolithic kernel without initramfs (like me):


Thank you, but I can't take the credit. The credit should go to the person that created the 5.2.1 kernel config I've adapted for 6.6.38; Vitaly Minko.

Quote:
The only option I miss is: ...


It's listed as deprecated in make menuconfig. I've enabled it, no luck I'm afraid.

Quote:
1. The UEFI boot entry (we had some buggy UEFI): Boot again with our GentooCD and check the output of "efibootmgr". Does it point to your kernel ?


The relevant entry states:

Code:
Boot0000* Gentoo   HD(1,GPT,bd09dbf5-44c8-4bd0-874b-01708de8aeae,0x800,0x32000)/File(\EFI\GENTOO\BZIMAGE-6.6.38.EFI)


So far as I'm aware this appears to be correct. I have tried booting the kernel in other ways too though, with the same result.

Quote:
2. Maybe a problem with the kernel command line parameters: Remove all, except "root=... ro" ... Does it boot now ?


No luck :(

Quote:
If all of this does not help ... we have some special kernel command line parameters:


It's drawing to the screen now, albeit the wrong way up, but still :lol:

After waiting what felt like an eternity it started printing errors and call traces. Unfortunately, the salient information isn't sticking out to me. Something about device initialization failed (-19), please file a bug on drm/i915. However, a lot more than that was printed, and so far as I'm aware there's no way for me to get a copy I could upload.

Update 1: I've waited longer still and it has stopped printing. I can see a message stating: "System cannot boot: Missing /etc/machine-id and /etc is mounted read-only."

Update 2: I think I've found the boot information. Despite what it says about root@livecd at the top (presumably as I'm viewing the log on a live USB), I think this is the relevant boot journal. Either that, or the errors that I saw when booting the kernel with those special kernel arguments had nothing to do with the problem I'm experiencing.

Quote:
1. I dont know your machine and which touchpad it has ... Maybe you will need later some more options for it (e.g.# CONFIG_I2C_HID is not set; # CONFIG_HID_MULTITOUCH is not set; but there is a description in my wiki article for that)


Good catch, I'll wait to look into that until it becomes an issue though.

Quote:
2. I really would disable this: CONFIG_PSTORE=y (do you know what it does?; you could kill your UEFI)


Nope, no idea what it does. I can't turn it off though, make menuconfig states that it has been selected by ACPI_APEI [=y] && ACPI [=y] && HAVE_ACPI_APEI [=y].

Quote:
3. Does it have USB3 ? Because you have CONFIG_TYPEC=y ... but you miss: # CONFIG_HOTPLUG_PCI is not set (thunderbolt and USB3 need PCI hotplugging)


It does have USB3. I've enabled hotplugging now.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Tue Jul 16, 2024 5:38 pm    Post subject: Reply with quote

tempuser wrote:
It's drawing to the screen now, albeit the wrong way up, but still :lol:

So we now know that UEFI starts the kernel. Good !

So, my next question would be:

Was the output wrong rotated from the beginning ? (Or first correct, then wrong) Has it changed later ?

tempuser wrote:
Update 2: I think I've found the boot information. Despite what it says about root@livecd at the top (presumably as I'm viewing the log on a live USB), I think this is the relevant boot journal. Either that, or the errors that I saw when booting the kernel with those special kernel arguments had nothing to do with the problem I'm experiencing.

This I see as a problem:
Code:
Jul 16 14:45:56 localhost kernel: i915 0000:00:02.0: [drm] Unknown revid 0x06

Jul 16 14:45:56 localhost kernel: [drm:0xffffffff814bb7f4] *ERROR* connector DSI-1 leaked!

I guess you will need additionally (to our debug-parms) the parm: "fbcon=rotate:1" ... and THEN I would need the complete output of the screen ... maybe take some photos ? Maybe you are able to boot afterwards again with our GentooCD and check again /var/log/messages ?

(just a quick note: If our GentooCD can boot without problems, you should have no problem also if you install our gentoo-kernel-bin)

A correct boot with our GentooCD is also a proof that something is wrong with the kernel configuration. You had to answer some questions during “make oldconfig” ... maybe something was overlooked. What you should configure in any case: NUMA (must be enabled first) and then CONFIG_ACPI_NUMA.

I am sorry I cannot help with systemd (I am a OpenRC man) ... but I am sure some experts will jump in after we solved the problem with your monitor output.

BTW:
tempuser wrote:
Nope, no idea what it does. I can't turn it off though, make menuconfig states that it has been selected by ACPI_APEI [=y] && ACPI [=y] && HAVE_ACPI_APEI [=y].

Yes ... I also dont recommend ACPI_APEI ... instead you could enable: CONFIG_ACPI_PROCESSOR_AGGREGATOR=y (to safe power; but it is not related to this problem).
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Tue Jul 16, 2024 5:48 pm    Post subject: Reply with quote

P.S.:

tempuser wrote:
It's listed as deprecated in make menuconfig. [...]

Yes, it is marked as “deprecated” because they want to get rid of it, but unfortunately it is still needed (hence the recommendation in the help to activate it).
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Wed Jul 17, 2024 3:27 pm    Post subject: Reply with quote

I've enabled the NUMA settings and CONFIG_ACPI_PROCESSOR_AGGREGATOR, I've disabled APEI and PSTORE, and I've updated the built-in kernel command line to include fbcon=rotate:1. Sadly, no change (the screen hasn't even rotated). The relevant boot log just reads:

Code:
-- No entries --


Which isn't much help. It was more informative before so I don't know what has changed. I don't think screenshotting is a good idea; screenfuls of information slowly go past over the course of an hour or so until it finally settles.

Quote:
Maybe you are able to boot afterwards again with our GentooCD and check again /var/log/messages ?


No such file in the installation filesystem. It exists on the live USB's filesystem though.

Quote:
Was the output wrong rotated from the beginning ? (Or first correct, then wrong) Has it changed later ?


This is a known problem with this laptop.

Quote:
The native orientation of the integrated display is portrait. Therefore the framebuffer console needs to be rotated. -https://handheld.computer/?page_id=620


One interesting thing I did note is that when I boot the Gentoo live USB, GRUB2 is the wrong way up, then the Linux framebuffer is the right way up, then when X11 starts it's the wrong way up again. Unfortunately, half the KDE applications seem to segfault when they should start, meaning I can't even get to the display settings to correct it. So far, I've had to do the entire installation with my head tilted.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Wed Jul 17, 2024 5:15 pm    Post subject: Reply with quote

Perhaps I have misunderstood something: I thought the system log (dmesg) was from a boot of your kernel - was it from the GentooCD boot ?

What exactly is the status when you boot your kernel with the options:

root=... ro earlycon=efifb efi=debug initcall_debug ignore_loglevel fbcon=rotate:1

?

What happens if you boot an UbuntuLiveCD ?
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Thu Jul 18, 2024 11:23 am    Post subject: Reply with quote

Quote:
Perhaps I have misunderstood something: I thought the system log (dmesg) was from a boot of your kernel - was it from the GentooCD boot ?


My apologies, please allow me to clarify the situation. The log came from running journalctl -b when arch-chrooted into the installation environment. The information stored there appears to correspond to what I see on screen when I boot the installation environment now (after adding the debug kernel arguments), except that for some reason no journal information was written last time, and I don't know why (after I made the last batch of changes to the kernel). I can see other boot logs using journalctl --list-boots and then can select the entry I want using journalctl -b -N where N is the number of the log I'm interested in.

Quote:
What exactly is the status when you boot your kernel with the options: ...


Screenfuls of information go past at a snail's pace, and after an hour or so it stops. At this point I cannot type anything, or interact in any way that I have found, besides pressing the power button which then triggers more information to be printed to the screen until the system finally shuts down.

Quote:
What happens if you boot an UbuntuLiveCD ?


I've not tried Ubuntu, but I can say that the Garuda Linux installation media appears to work flawlessly. Let me know if you'd like me to try Ubuntu specifically.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Thu Jul 18, 2024 11:35 am    Post subject: Reply with quote

tempuser wrote:
[...] Let me know if you'd like me to try Ubuntu specifically.

Because you said that it works with Garuda Linux I am even more convinced that it is a problem with the kernel. Whether Ubuntu or Garuda doesn't matter; that's why I would like to reduce the output on the screen again next and then really need the errors that come up (take photos); please boot with these parameters:

root=... ro earlycon=efifb ignore_loglevel keep_bootcon fbcon=rotate:1

OR (if it is wrong orientated)

root=... ro earlycon=efifb ignore_loglevel keep_bootcon
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
tempuser
n00b
n00b


Joined: 24 Aug 2011
Posts: 68

PostPosted: Thu Jul 18, 2024 4:01 pm    Post subject: Reply with quote

So here's an odd one. If I try to view the most recent boot log (#0) using journalctl -b I get the no entries message, if I try to view the second most recent boot log using journalctl -b -1 (#-1) then it does display, even if it wouldn't display before when it was the most recent boot log (#0). In other words, if I want to view the most recent boot log I need to boot the machine one more time in order to make it the second most recent boot log.

Anyway, here's the most recent boot log I can access (with the new kernel arguments you specified).

Quote:
OR (if it is wrong orientated)


It's always the wrong way up. Unfortunately, adding fbcon=rotate:1 makes no difference.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 4645
Location: Bavaria

PostPosted: Thu Jul 18, 2024 7:03 pm    Post subject: Reply with quote

Okay, I can imagine why you tried to switch it off:
Code:
Jul 18 13:47:46 localhost kernel: x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.

But you should really reactivate CONFIG_X86_PAT. :!:

Number 2: This is where the problems start:
Code:
Jul 18 13:47:46 localhost kernel: i915 0000:00:02.0: [drm] Unknown revid 0x06

Even if your site recommends something else, the first thing I would like to have is these parameter ADDITIONALLY
i915.enable_psr=0 i915.enable_dc=0
./. our debug parms
+ "root=... ro"
=>
root=... ro i915.enable_psr=0 i915.enable_dc=0

Number 3: If we can solve this problem, the next one is already waiting for us (but could also just be the consequence of the missing X86_PAT):
Code:
Jul 18 13:47:46 localhost kernel: iwlwifi 0000:01:00.0: Failed to load firmware chunk!
Jul 18 13:47:46 localhost kernel: iwlwifi 0000:01:00.0: iwlwifi transaction failed, dumping registers


If we don't make any progress, my last attempt would be to go to kernel version 6.9.10.
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Installing Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum