Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Crashing in tty
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
prestige787
n00b
n00b


Joined: 18 Sep 2021
Posts: 40

PostPosted: Mon Feb 14, 2022 9:32 pm    Post subject: Crashing in tty Reply with quote

I have installed gentoo a few times now and on my pc I always encounter the same issue: when in tty I will eventually get a black screen, and the system stops responding. The post code on the motherboard makes it seem like it's rebooting but it's not, a long hold of the power button is required. This always happens, sometimes after a few seconds, before I can even log in, sometimes after a few minutes. It doesn't happen once I get into my window manager, nor does it happen on any other linux distro I've used. EDIT: not true, have had it happen in debian aswell, but still not on the dist-kernel. It also doesn't happen on my laptop which is a very similar install. I'm assuming I'm missing something in my kernel config, everything else except network configs should be identical to my laptop which doesn't have the issue, but has a different kernel.

Hardware:

Amd 6800xt

ryzen 5900x

I followed the kernel configs on the pages for my gpu and cpu.

Kernel configs:

https://pastebin.com/zim5hC2N

For zen kernel (I know it's unsupported, should be pretty much the same config):

https://pastebin.com/ttB8U4yT


Last edited by prestige787 on Mon Feb 21, 2022 4:13 pm; edited 1 time in total
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Tue Feb 15, 2022 12:58 am    Post subject: Reply with quote

1) Which CPU does you PC have? The Ryzen 5900x?

2) How much RAM does it have?

3) Which graphics card do you have?

4) Do I understand correctly that you run the kernel "zim5hC2N" on your PC? What's the matter with the 'Zen' kernel? Where do you use that?

5) Please post the output of
Code:
lspci -knn

6) Please post the output of
Code:
dmesg
Back to top
View user's profile Send private message
prestige787
n00b
n00b


Joined: 18 Sep 2021
Posts: 40

PostPosted: Tue Feb 15, 2022 3:14 am    Post subject: Reply with quote

1) Yes, it's the ryzen 9 5900x

2) 64gb, had the same issue when running with 32gb

3) AMD radeon rx 6800xt

4) Yes, this is the kernel I'm using. Just to make sure I got the right one, here is the config again:

https://pastebin.com/7rF3cwmC

5)

lspci -knn: https://pastebin.com/E4qBv2kN

6)

dmesg: https://pastebin.com/87azYSZL
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Tue Feb 15, 2022 11:03 am    Post subject: Reply with quote

Thanks for the data.

There are a couple of error messages in dmesg:
Code:
[    5.601603] iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-67.ucode failed with error -2
[    5.604198] iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-66.ucode failed with error -2
[    5.607753] iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-65.ucode failed with error -2
[    5.611462] iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-64.ucode failed with error -2
[    5.864582] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2

The output of lspci shows some devices without drivers:
Code:
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
Subsystem: Gigabyte Technology Co., Ltd FCH SMBus Controller [1458:5001]

07:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
Subsystem: Gigabyte Technology Co., Ltd I211 Gigabit Network Connection [1458:e000]

10:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]

Initialization of the random number generator
Code:
[   79.124734] random: crng init done

is too late. Either enable the hardware random number generator or install haveged.

Here is what I would do:
  1. switch default scheduler to schedutil
  2. add missing firmware (see above) to CONFIG_EXTRA_FIRMWARE
  3. enable CONFIG_HW_RANDOM_AMD
  4. enable CONFIG_I2C_PIIX4 (to get the SMBUS kernel driver)
  5. enable CONFIG_IGB (to get the Intel Ethernet driver)
  6. enable CONFIG_DRM_VGEM
  7. enable CONFIG_HIGH_RES_TIMERS
  8. enable CONFIG_CRYPTO_DEV_CCP and sub-options (to get support for the Cryptographic Coprocessor PSPCPP)
Back to top
View user's profile Send private message
prestige787
n00b
n00b


Joined: 18 Sep 2021
Posts: 40

PostPosted: Tue Feb 15, 2022 5:17 pm    Post subject: Reply with quote

Ok, so I made all of those changes except the second one, I don't need wifi so instead I just disabled wireless lan, and the errors don't show up anymore. Unfortunately it's still crashing. I'm still getting the regulatory.db error, I think this is also just related to wireless but I'll try to get rid of it. I think all of the drivers are now being loaded, the random number generator is still at the very end of dmesg, but it's at about 12s instead of 79s. Is this still too late?

New config: https://pastebin.com/5n4Ut8AZ

dmesg: https://pastebin.com/Sgd0mTAv

lspci -knn: https://pastebin.com/TXpyf7f7
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Tue Feb 15, 2022 9:39 pm    Post subject: Reply with quote

I compared your .config with my .config. Below are the most important differences:
  1. enable CONFIG_RANDOM_TRUST_CPU (will push crng init time to < 1 sec)
  2. enable CONFIG_POSIX_MQUEUE (recommended)
  3. enable CONFIG_BSD_PROCESS_ACCT (recommended)
  4. enable CONFIG_IKCONFIG_PROC (recommended)
  5. enable CONFIG_SCHED_AUTOGROUP
  6. I'm unsure about CONFIG_MK8=y. I have CONFIG_MCORE2=y
  7. enable CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS
  8. enable CONFIG_AMD_MEM_ENCRYPT (I think your CPU supports it. unsure)
  9. CONFIG_NUMA=y: is that the right choice? Does your CPU have a NUMA architecture? Probably not. unsure.
  10. enable CONFIG_X86_PMEM_LEGACY
  11. CONFIG_PM: unsure, this option is enabled on my machine
  12. disable CONFIG_AGP
  13. enable FB_MODE_HELPERS
  14. enable AMD_PTDMA (unsure what it does, but it is enabled on my machine)
  15. enable CONFIG_ACPI_WMI
  16. enable CONFIG_FANOTIFY
Back to top
View user's profile Send private message
prestige787
n00b
n00b


Joined: 18 Sep 2021
Posts: 40

PostPosted: Wed Feb 16, 2022 1:07 am    Post subject: Reply with quote

I made all the changes, unfortunately it's still crashing. Just to make 100% sure it's the kernel I downloaded the distribution kernel, and this does not have any problems, so it must be something with the config. I did notice that on the custom kernel, I get an 'FF' post code on my motherboard, as opposed to the 00 code on the distribution kernel. Post codes can be pretty useful so I checked what they are after it crashes. The post codes change very quickly, but the ones that it stops on for some time are:

FF->62->d2->Ad_ac->AA->00

FF is a reserved error code, some people say it mean fault free though. This code is also present while in the gentoo installer which never gave me any issues. Most people with this code had issues with windows software.

62 is a regular boot code meaning 'Installation of the PCH runtime services'. It stays on this code for a long time, has to do with the chipset, some people had this with graphics card issues.

D2 is a straight error code for 'PCH initialization error', should never show up during a normal boot. Some people solved this by removing non needed m.2 drives. I have 3 nvme m.2 drives, I might try to remove them later though it's an annoying thing to do. Starting to think that might be the root of the issue though, not sure what options there are for nvme settings in the kernel.

Ad is 'issue Ready To Boot event for OS Boot' wasn't able to find much info on this one, does stay on it for a long time. The rest are reserved except 00 which is what it should usually be in the os.

EDIT: I enabled nvme multipath support which unfortunately didn't help.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Wed Feb 16, 2022 1:37 am    Post subject: Reply with quote

That's strange. I've never heard of such a problem before. We frequently have users that get a black screen immediately after booting. But that's because they failed to configure their graphics card.
A machine that freezes after a few minutes - and that stops freezing if you enter the graphics mode... That's highly unusual.

It's not necessary to use the Gentoo kernel. I use the vanilla kernel, for example. You could also use a kernel from a different distribution - if it works for you.

I think I would let memtest86-bin run for a couple of hours. Just to make sure it's not a memory error.
Back to top
View user's profile Send private message
prestige787
n00b
n00b


Joined: 18 Sep 2021
Posts: 40

PostPosted: Thu Feb 17, 2022 4:33 am    Post subject: Reply with quote

Ran memtest for about 2 hours, would need to run it for probably around 10 to be sure with 64gb, might do that over night to be sure since memory errors can lead to some nasty problems. Also tried removing the two nvme's connected to the x570 chipset, which did not help. Loading amdgpu as a module seems to make it take longer to crash but hard to say with how random it is. I'll play around with some other kernels like you suggested, might try with a fresh config if I have time. Thanks for your time!
Back to top
View user's profile Send private message
prestige787
n00b
n00b


Joined: 18 Sep 2021
Posts: 40

PostPosted: Sun Feb 20, 2022 12:47 am    Post subject: Reply with quote

Ok, so I decided to start from a working kernel (gentoo dist-kernel), and trim it down until I could replicate the issue. I believe it is one of the options in the "Networking Support" directory in the "menu config". This seems a bit odd, but the two configs:

https://pastebin.com/p6L9cejg

https://pastebin.com/1U3Bq6jA

Differ only in options from Networking support, the first one is the regular dist-kernel and it does not crash (there is a very occasional flicker), the second is the modified one and it does crash. Unfortunately there are way too many options to try one by one, but at least I have it narrowed down a little bit.

The only other difference is that when you emerge the dist-kernel, it automatically creates an initramfs, whereas as when I did it 'manually' I did it with dracut, but I don't think this would create any difference. (Also I'm pretty sure portage also just uses dracut but wasn't paying attention to the output).
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum