View previous topic :: View next topic |
Author |
Message |
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Fri Sep 13, 2024 10:55 am Post subject: Troubles with Intel GPU on sys-kernel/gentoo-sources-6.10.* |
|
|
I'm trying the sys-kernel/gentoo-sources-6.10 branch.
On headless machines, this seems to work fine.
On a Intel i7-11800H mini-PC desktop, the machine reboots violently from time to time (no panic message as far as I know) or slowly chokes and freezes. I am not 100% sure of the cause but I suspect the GPU, as this happened more often when I was running Steam games.
Did anybody have similar problems? What can I do to debug this issue?
cpuinfo & lspci :
model name : 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01) |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
|
Back to top |
|
|
fkobi n00b
Joined: 21 Jul 2024 Posts: 3 Location: Poland
|
Posted: Mon Sep 16, 2024 5:05 pm Post subject: |
|
|
Does this happen with gentoo-kernel? |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Wed Sep 18, 2024 8:31 am Post subject: |
|
|
fkobi wrote: | Does this happen with gentoo-kernel? |
I did not try it. |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Wed Sep 18, 2024 8:50 am Post subject: |
|
|
Sorry for the later answer. I'm now running 6.10.10 which seems more stable.
It includes fixes for AMD GPUs but not Intel AFAIK. Odd...
pietinger wrote: | Why do you suspect the GPU ? |
Because I had several other issues:
severe GUI slow down while running a game (Civ6 if that matters), odd dmesg messages (that I did not copied unfortunately) ...
emerge --info: https://bpa.st/AIODO
Why can't I upload the config files with wgetpaste?
Last edited by vm666 on Wed Sep 18, 2024 9:58 am; edited 1 time in total |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
Posted: Wed Sep 18, 2024 9:44 am Post subject: |
|
|
Hmm ... you have a nice and fast system .... but your swap partition is REALLY too big ...
vm666 wrote: | Why can't I upload the config files with wgetpaste? |
Try another service:
Code: | $ wgetpaste -v --service 0x0 /usr/src/linux/.config
Your paste can be seen here: http://0x0.st/X3eU.txt |
(my old config for my i9) _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Wed Sep 18, 2024 9:50 am Post subject: |
|
|
pietinger wrote: | Hmm ... you have a nice and fast system .... but your swap partition is REALLY too big ... |
Actually during some experiments I had to add swap.
I'm looking for a simple way to limit RAM usage for some processes by the way (I mean resident, not virtual memory). I could not do it with ulimit, I have to use cgroups
.config for 6.10.7 http://0x0.st/X3ek.txt
.config for 6.10.10 http://0x0.st/X3en.txt
Last edited by vm666 on Wed Sep 18, 2024 4:27 pm; edited 1 time in total |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
Posted: Wed Sep 18, 2024 10:08 am Post subject: |
|
|
I see you have much experience with a kernel configuration, but I would change these:
Code: | 1.
# CONFIG_X86_X2APIC is not set
2.
CONFIG_I2C_I801=m
3.
CONFIG_DRM_XE=m
4.
CONFIG_DRM_SIMPLEDRM=m
5.
CONFIG_FB=m
6.
CONFIG_FB_UVESA=m
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RADEON=m
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
7.
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set |
1. Enable it you have an i7-11800
2. This is the only one you need (you can disable the others)
3. Disable it
4. Disable it
5. You must enable it statically to get EFI-FB. See: https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel#Framebuffer_Device_and_Console
6. Disable them (after you have enabled VESA and EFI)
7. Enable it _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Wed Sep 18, 2024 4:26 pm Post subject: |
|
|
pietinger wrote: | I see you have much experience with a kernel configuration, but I would change these:
Code: | 1.
# CONFIG_X86_X2APIC is not set
7.
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set |
1. Enable it you have an i7-11800
|
Actually I should enable it on all my machines :-/
(at least 3 other where it is not enabled for whatever stupid reason) |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Fri Sep 20, 2024 8:04 am Post subject: |
|
|
vm666 wrote: | Sorry for the later answer. I'm now running 6.10.10 which seems more stable. |
More stable but not entirely stable. The machine rebooted during the night.
$ uptime -s
2024-09-20 01:59:39
Nothing significant in the logs I'm afraid.
Code: |
Sep 20 01:30:00 grillepain CROND[233439]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:30:00 grillepain CROND[233438]: (root) CMDEND (/usr/lib/sa/sa1 1 1)
Sep 20 01:40:00 grillepain CROND[236093]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:40:00 grillepain CROND[236092]: (root) CMDEND (/usr/lib/sa/sa1 1 1)
Sep 20 01:41:19 grillepain root[236471]: ACPI event unhandled: button/up UP 00000080 00000000 K
Sep 20 01:50:00 grillepain CROND[238862]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:50:00 grillepain CROND[238861]: (root) CMDEND (/usr/lib/sa/sa1 1 1)
Sep 20 01:59:49 grillepain syslog-ng[2076]: syslog-ng starting up; version='4.6.0'
Sep 20 01:59:49 grillepain acpid[2107]: starting up with netlink and the input layer
Sep 20 01:59:49 grillepain acpid[2107]: 1 rule loaded
Sep 20 01:59:49 grillepain acpid[2107]: waiting for events: event logging is off
Sep 20 01:59:49 grillepain dhcpcd[2278]: dhcpcd-10.0.8 starting
Sep 20 01:59:49 grillepain dhcpcd[2284]: dev: loaded udev
Sep 20 01:59:49 grillepain dhcpcd[2284]: DUID 00:01:00:01:2c:c3:07:49:68:1d:ef:35:cd:59
Sep 20 01:59:49 grillepain kernel: 8021q: 802.1Q VLAN Support v1.8
Sep 20 01:59:49 grillepain dhcpcd[2284]: no interfaces have a carrier
Sep 20 01:59:49 grillepain kernel: Loading firmware: rtl_nic/rtl8168h-2.fw
|
Moderation note: Fixed code block formatting. -- Banana
EDIT: It crashed again, I was not in front of the machine unfortunately.
$ uptime -s
2024-09-20 13:59:30
$ |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
Posted: Fri Sep 20, 2024 2:53 pm Post subject: |
|
|
vm666 wrote: | Nothing significant in the logs I'm afraid. |
A reboot without any error ... hmm ... what is that -> ?
vm666 wrote: | Code: | Sep 20 01:30:00 grillepain CROND[233439]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:30:00 grillepain CROND[233438]: (root) CMDEND (/usr/lib/sa/sa1 1 1) |
|
(maybe clear your crontab?) _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Fri Sep 20, 2024 5:37 pm Post subject: |
|
|
pietinger wrote: | vm666 wrote: | Nothing significant in the logs I'm afraid. |
A reboot without any error ... hmm ... what is that -> ?
|
Or there is an error but it is not saved on the file system.
Quote: | (maybe clear your crontab?) |
I suspected that it could be triggered by some cron job, but they all look innocuous.
I had problems with scrub a while ago, but this is not that, I tried a full scrub and it worked fine.
https://forums.gentoo.org/viewtopic-t-1165800-highlight-scrub+balance.html |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
Posted: Fri Sep 20, 2024 7:25 pm Post subject: |
|
|
You said you think there is no kernel panic; are you sure ? Maybe it would make sense to go from 6 seconds to 0 (wait forever) to be sure ? ->
Code: | CONFIG_PANIC_TIMEOUT=6 |
(also make sure that there are no settings in sysctl.conf ... like a -1 which does immediately a reboot) _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
Nowa Developer
Joined: 25 Jun 2014 Posts: 432 Location: Nijmegen
|
Posted: Fri Sep 20, 2024 7:57 pm Post subject: |
|
|
Quote: | Or there is an error but it is not saved on the file system. |
It could be that whatever triggers it happens at a very low level, have you checked if the firmware was updated when the kernel was updated?
Possibly a silly suggestion, but have you already verified that the machine is not simply overheating? _________________ OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400 |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Tue Sep 24, 2024 7:29 pm Post subject: |
|
|
I don't use KDE
I had another crash 2 days ago. Just before 2 pm. I have a cron job that starts at *:59
It just copies ~/Dropbox to a NFS share though rsync but I don't believe in a coincidence here. The job runs every hour and there are aoften new files, so it is not just the copy that triggers it.
Could I have an issue with the soft or hard lockups detection, or with my watchdog?
I disabled the NMI watchdog, just in case. I'm not sure I need it. |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Wed Sep 25, 2024 4:37 pm Post subject: |
|
|
vm666 wrote: | I disabled the NMI watchdog, just in case. I'm not sure I need it. |
It froze again during the night. The X11 GUI was frozen, the machine did not answer to ping, I could only reboot it.
How can I preserve the last kernel messages after a crash?
One detail:
After some investigation, I discovered that the iTCO_wdt watchdog did not work on this mini PC. I have another machine in the same situation.
AFAIK, iTCO_wdt works on all my other (old) machines.
If I understand correctly, iTCO_wdt is provided by the chipset and the motherboard manufacturer has to wire it correctly |
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Mon Oct 28, 2024 12:37 pm Post subject: |
|
|
I am pretty sure now that it only freezes when I am playing Civ6 through Steam. Maybe this is related to Proton (Steam version of Wine) and not the Intel GPU driver.
I could not make any progress to debug this |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5145 Location: Bavaria
|
|
Back to top |
|
|
vm666 n00b
Joined: 24 Oct 2003 Posts: 67
|
Posted: Mon Oct 28, 2024 3:08 pm Post subject: |
|
|
pietinger wrote: | I think I cannot help here any further ... sorry (I have no experience with steam games) ... |
if the cause is the GPU driver, it must be some rarely used 3D function. I would be surprised if it were only used by Proton. |
|
Back to top |
|
|
|