View previous topic :: View next topic |
Author |
Message |
Hupf Tux's lil' helper


Joined: 11 Sep 2005 Posts: 112 Location: Germany
|
Posted: Tue May 05, 2020 6:36 am Post subject: gentoo-sources-5.6 fail to boot without kernel panic |
|
|
Hi,
I'm trying to debug why after upgrading from gentoo-sources-5.5.x to 5.6.x, my machine fails to boot. Rollback is easy at the moment, but at some point I'd like to be able to make the switch.
On 5.5.x, my working setup is as detailed below. I've verified that my toolchain should be properly configured by recently upgrading to gentoo-sources-5.5.19 without issues.
Boot stack (working with 5.5):
- Kernel is manually configured* with embedded firmware (AMD Picasso GPU + CPU microcode), EFI stub, embedded command line and embedded initramfs
- UEFI loads EFI stub from NVMe
- CONFIG_CMDLINE omits the root=, init= or initrd= options, neither are they passed via the UEFI/efibootmgr (i.e. defaults are used)
- EFI stub loads embedded initramfs
- initramfs built using custom initramfs.list including custom /init script, which invokes cryptsetup on the actual root partition (on NVMe)
- custom /init hands over to systemd
What I see with 5.5.19:
- printk's relating to various hardware, e.g. USB keyboard, NVMe partions, unused SATA slots etc
- "Freeing unused kernel image [...] memory: xxxK"
- "Run /init as init process"
- <output from initramfs's /init script>
Config-wise,
What I see with 5.6.10:
- printk's relating to various hardware, e.g. USB keyboard, NVMe partions, unused SATA slots etc
- no further boot progress, only a blinking cursor after the last printk
So, with 5.6.10, there is no panic/error message (as far as the scrollback goes - right now I don't have hardware for serial console over USB available). The boot process is missing / stopping before the steps "Freeing unused kernel image memory" and "Running /init as init process".
The NVMe device seems to be recognized properly and partitions listed. The embedded initramfs seems to be there just fine - it is built during make and the resulting bzImage is of very similar size to the 5.5.19 one. I'm uncertain if the handover to the initramfs is the problem at all, since then I would expect complaints about the kernel being unable to find/call /init (or the root device). Also I wonder why I'm not seeing the "Freeing unused kernel memory".
I tried scanning the Documentation/, kernel.org and gentoo bugs as well as changelogs/commit histories for anything related to e.g. initramfs, EFI stub and NVMe but couldn't make out changes specific to my problem. I also tried explicitly specifying root=/dev/ram0 init=/init without seeing changes in the output.
* .config for 5.6.10 (not booting) |
|
Back to top |
|
 |
dr_wulsen Tux's lil' helper


Joined: 21 Aug 2013 Posts: 146 Location: Austria
|
Posted: Tue May 05, 2020 7:12 pm Post subject: |
|
|
Have you tried to set the kernel config CONSOLE_LOGLEVEL to 15 ?
With some good luck, you might be able to circle the issue with more output.
Of course, you could try to enable any other relevant VERBOSE option in your kernel for fun and see if output becomes more meaningful. _________________ There's no stupid questions, only stupid answers. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55013 Location: 56N 3W
|
Posted: Tue May 05, 2020 7:33 pm Post subject: |
|
|
Hupf,
Wild thought. Its all working except the console.
If you can ssh in and read dmesg, you may find that you need a new (extra) firmware file for your GPU.
I have a Polaris11 card and that's bitten me twice since the 4.16 kernel _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Hupf Tux's lil' helper


Joined: 11 Sep 2005 Posts: 112 Location: Germany
|
Posted: Tue Jun 02, 2020 3:38 pm Post subject: |
|
|
Some updates.
After a session of https://wiki.gentoo.org/wiki/Kernel_git-bisect I identified https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=33960acccfbd7f24d443cb3d0312ac28abe62bae as the offending commit.
So I started playing around (on the most recent gentoo-sources:5.7.0) with the kernel config options related to crypto devices and trusted execution environment.
The culprit seems to be CONFIG_CRYPTO_DEV_CCP_CRYPTO.
Before 33960acccfbd7f24d443cb3d0312ac28abe62bae, my system will boot with CONFIG_CRYPTO_DEV_CCP_CRYPTO=y. After the commit (up until 5.7.0), the system will bot with CONFIG_CRYPTO_DEV_CCP_CRYPTO=n or CONFIG_CRYPTO_DEV_CCP_CRYPTO=m, but not with CONFIG_CRYPTO_DEV_CCP_CRYPTO=y. Also, with CONFIG_CRYPTO_DEV_CCP_CRYPTO=m, the module will load just fine after boot (i.e. without halting/crashing the system for example - I haven't tried actually using the functionality).
Here is the working config-5.7.0-gentoo for reference.
Furthermore, as hinted by the commit's diff, I indeed have a
Code: | ~ # lspci -nn | grep -i 15df
0c:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df] |
on my
Code: | ~ # cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 24
model name : AMD Ryzen 5 3400G with Radeon Vega Graphics
stepping : 1
microcode : 0x8108109
cpu MHz : 1295.399
cache size : 512 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass |
I believe the appropriate next step would be to file a bug report upstream, which will be something new for me to learn - or is there anything else I should do/try before that? |
|
Back to top |
|
 |
Goverp Advocate


Joined: 07 Mar 2007 Posts: 2216
|
Posted: Wed Jun 03, 2020 8:08 am Post subject: |
|
|
Don't assume filing a bug on bugs.kernel.org will achieve anything. You need to look at /usr/src/MAINTAINERS to see where the people who look after the bit of code you have a problem with. When I had a problem with the AMDGPU driver, I needed to send a polite explanation of the bug, a git bisect and context info to the mailing list. A patch would be even more successful! They were very quick to fix it then. Before that, I and others had mistakenly put a bug on bugs.kernel.org and waited, and waited.... _________________ Greybeard |
|
Back to top |
|
 |
Ant P. Watchman

Joined: 18 Apr 2009 Posts: 6920
|
Posted: Wed Jun 03, 2020 8:17 am Post subject: |
|
|
On my system the ccp driver outright refuses to load complaining that the BIOS is playing tricks. It's possible yours is similar and the kernel doesn't detect it. |
|
Back to top |
|
 |
toralf Developer


Joined: 01 Feb 2004 Posts: 3943 Location: Hamburg
|
Posted: Wed Jun 03, 2020 8:35 am Post subject: |
|
|
You bisected it - good work!
Now just identify the To: and Cc: by doing something like Code: | $ ./scripts/get_maintainer.pl -f drivers/crypto/ccp/Makefile
Tom Lendacky <thomas.lendacky@amd.com> (supporter:AMD CRYPTOGRAPHIC COPROCESSOR (CCP) DRIVER)
Herbert Xu <herbert@gondor.apana.org.au> (maintainer:CRYPTO API)
"David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
linux-crypto@vger.kernel.org (open list:AMD CRYPTOGRAPHIC COPROCESSOR (CCP) DRIVER)
linux-kernel@vger.kernel.org (open list)
| for the affected files and send an email with the details.
FWIW the kernel bugzilla isn't used by kernel folks, they tend to communicate via email still. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|