Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Troubles with Intel GPU on sys-kernel/gentoo-sources-6.10.*
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Fri Sep 13, 2024 10:55 am    Post subject: Troubles with Intel GPU on sys-kernel/gentoo-sources-6.10.* Reply with quote

I'm trying the sys-kernel/gentoo-sources-6.10 branch.
On headless machines, this seems to work fine.
On a Intel i7-11800H mini-PC desktop, the machine reboots violently from time to time (no panic message as far as I know) or slowly chokes and freezes. I am not 100% sure of the cause but I suspect the GPU, as this happened more often when I was running Steam games.

Did anybody have similar problems? What can I do to debug this issue?

cpuinfo & lspci :
model name : 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Fri Sep 13, 2024 11:13 am    Post subject: Reply with quote

Why do you suspect the GPU ? It could be anything, such as programs causing a memory leak, or a kernel problem. Since you are using the gentoo-sources, my first question would be how you have configured your kernel. So we would need your kernel .config file and all 3 files mentioned here:
https://wiki.gentoo.org/wiki/User:Pietinger/Overview_of_System_Information
(Note the reference to wgetpaste).

On the other hand, you can also browse here yourself:
https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
fkobi
n00b
n00b


Joined: 21 Jul 2024
Posts: 3
Location: Poland

PostPosted: Mon Sep 16, 2024 5:05 pm    Post subject: Reply with quote

Does this happen with gentoo-kernel?
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Wed Sep 18, 2024 8:31 am    Post subject: Reply with quote

fkobi wrote:
Does this happen with gentoo-kernel?


I did not try it.
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Wed Sep 18, 2024 8:50 am    Post subject: Reply with quote

Sorry for the later answer. I'm now running 6.10.10 which seems more stable.
It includes fixes for AMD GPUs but not Intel AFAIK. Odd...

pietinger wrote:
Why do you suspect the GPU ?


Because I had several other issues:
severe GUI slow down while running a game (Civ6 if that matters), odd dmesg messages (that I did not copied unfortunately) ...

Quote:
Since you are using the gentoo-sources, my first question would be how you have configured your kernel. So we would need your kernel .config file and all 3 files mentioned here:
https://wiki.gentoo.org/wiki/User:Pietinger/Overview_of_System_Information



emerge --info: https://bpa.st/AIODO

Why can't I upload the config files with wgetpaste?


Last edited by vm666 on Wed Sep 18, 2024 9:58 am; edited 1 time in total
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Wed Sep 18, 2024 9:44 am    Post subject: Reply with quote

Hmm ... you have a nice and fast system .... but your swap partition is REALLY too big ... :lol:
vm666 wrote:
Why can't I upload the config files with wgetpaste?

Try another service:
Code:
$ wgetpaste -v --service 0x0 /usr/src/linux/.config
Your paste can be seen here: http://0x0.st/X3eU.txt

(my old config for my i9)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Wed Sep 18, 2024 9:50 am    Post subject: Reply with quote

pietinger wrote:
Hmm ... you have a nice and fast system .... but your swap partition is REALLY too big ...


Actually during some experiments I had to add swap.
I'm looking for a simple way to limit RAM usage for some processes by the way (I mean resident, not virtual memory). I could not do it with ulimit, I have to use cgroups

.config for 6.10.7 http://0x0.st/X3ek.txt
.config for 6.10.10 http://0x0.st/X3en.txt


Last edited by vm666 on Wed Sep 18, 2024 4:27 pm; edited 1 time in total
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Wed Sep 18, 2024 10:08 am    Post subject: Reply with quote

I see you have much experience with a kernel configuration, but I would change these:
Code:
1.
# CONFIG_X86_X2APIC is not set
2.
CONFIG_I2C_I801=m
3.
CONFIG_DRM_XE=m
4.
CONFIG_DRM_SIMPLEDRM=m
5.
CONFIG_FB=m
6.
CONFIG_FB_UVESA=m
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RADEON=m
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
7.
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set

1. Enable it you have an i7-11800
2. This is the only one you need (you can disable the others)
3. Disable it
4. Disable it
5. You must enable it statically to get EFI-FB. See: https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel#Framebuffer_Device_and_Console
6. Disable them (after you have enabled VESA and EFI)
7. Enable it
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Wed Sep 18, 2024 4:26 pm    Post subject: Reply with quote

pietinger wrote:
I see you have much experience with a kernel configuration, but I would change these:
Code:
1.
# CONFIG_X86_X2APIC is not set

7.
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set

1. Enable it you have an i7-11800


Actually I should enable it on all my machines :-/
(at least 3 other where it is not enabled for whatever stupid reason)
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Fri Sep 20, 2024 8:04 am    Post subject: Reply with quote

vm666 wrote:
Sorry for the later answer. I'm now running 6.10.10 which seems more stable.


More stable but not entirely stable. The machine rebooted during the night.
$ uptime -s
2024-09-20 01:59:39

Nothing significant in the logs I'm afraid.
Code:

Sep 20 01:30:00 grillepain CROND[233439]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:30:00 grillepain CROND[233438]: (root) CMDEND (/usr/lib/sa/sa1 1 1)
Sep 20 01:40:00 grillepain CROND[236093]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:40:00 grillepain CROND[236092]: (root) CMDEND (/usr/lib/sa/sa1 1 1)
Sep 20 01:41:19 grillepain root[236471]: ACPI event unhandled: button/up UP 00000080 00000000 K
Sep 20 01:50:00 grillepain CROND[238862]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:50:00 grillepain CROND[238861]: (root) CMDEND (/usr/lib/sa/sa1 1 1)
Sep 20 01:59:49 grillepain syslog-ng[2076]: syslog-ng starting up; version='4.6.0'
Sep 20 01:59:49 grillepain acpid[2107]: starting up with netlink and the input layer
Sep 20 01:59:49 grillepain acpid[2107]: 1 rule loaded
Sep 20 01:59:49 grillepain acpid[2107]: waiting for events: event logging is off
Sep 20 01:59:49 grillepain dhcpcd[2278]: dhcpcd-10.0.8 starting
Sep 20 01:59:49 grillepain dhcpcd[2284]: dev: loaded udev
Sep 20 01:59:49 grillepain dhcpcd[2284]: DUID 00:01:00:01:2c:c3:07:49:68:1d:ef:35:cd:59
Sep 20 01:59:49 grillepain kernel: 8021q: 802.1Q VLAN Support v1.8
Sep 20 01:59:49 grillepain dhcpcd[2284]: no interfaces have a carrier
Sep 20 01:59:49 grillepain kernel: Loading firmware: rtl_nic/rtl8168h-2.fw


Moderation note: Fixed code block formatting. -- Banana

EDIT: It crashed again, I was not in front of the machine unfortunately.
$ uptime -s
2024-09-20 13:59:30
$
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Fri Sep 20, 2024 2:53 pm    Post subject: Reply with quote

vm666 wrote:
Nothing significant in the logs I'm afraid.

A reboot without any error ... hmm ... what is that -> ?
vm666 wrote:
Code:
Sep 20 01:30:00 grillepain CROND[233439]: (root) CMD (/usr/lib/sa/sa1 1 1)
Sep 20 01:30:00 grillepain CROND[233438]: (root) CMDEND (/usr/lib/sa/sa1 1 1)

(maybe clear your crontab?)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Fri Sep 20, 2024 5:37 pm    Post subject: Reply with quote

pietinger wrote:
vm666 wrote:
Nothing significant in the logs I'm afraid.

A reboot without any error ... hmm ... what is that -> ?

Or there is an error but it is not saved on the file system.

Quote:
(maybe clear your crontab?)

I suspected that it could be triggered by some cron job, but they all look innocuous.

I had problems with scrub a while ago, but this is not that, I tried a full scrub and it worked fine.
https://forums.gentoo.org/viewtopic-t-1165800-highlight-scrub+balance.html
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Fri Sep 20, 2024 7:25 pm    Post subject: Reply with quote

You said you think there is no kernel panic; are you sure ? Maybe it would make sense to go from 6 seconds to 0 (wait forever) to be sure ? ->
Code:
CONFIG_PANIC_TIMEOUT=6

(also make sure that there are no settings in sysctl.conf ... like a -1 which does immediately a reboot)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 447
Location: Nijmegen

PostPosted: Fri Sep 20, 2024 7:57 pm    Post subject: Reply with quote

Quote:
Or there is an error but it is not saved on the file system.


It could be that whatever triggers it happens at a very low level, have you checked if the firmware was updated when the kernel was updated?

Possibly a silly suggestion, but have you already verified that the machine is not simply overheating?
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Sat Sep 21, 2024 1:56 pm    Post subject: Reply with quote

vm666,

have you seen this thread ? -> https://forums.gentoo.org/viewtopic-t-1170943-highlight-.html

(Do you have the same installation?)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Tue Sep 24, 2024 7:29 pm    Post subject: Reply with quote

pietinger wrote:
vm666,
have you seen this thread ? -> https://forums.gentoo.org/viewtopic-t-1170943-highlight-.html


I don't use KDE

I had another crash 2 days ago. Just before 2 pm. I have a cron job that starts at *:59
It just copies ~/Dropbox to a NFS share though rsync but I don't believe in a coincidence here. The job runs every hour and there are aoften new files, so it is not just the copy that triggers it.

Could I have an issue with the soft or hard lockups detection, or with my watchdog?

I disabled the NMI watchdog, just in case. I'm not sure I need it.
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Wed Sep 25, 2024 4:37 pm    Post subject: Reply with quote

vm666 wrote:
I disabled the NMI watchdog, just in case. I'm not sure I need it.


It froze again during the night. The X11 GUI was frozen, the machine did not answer to ping, I could only reboot it.
How can I preserve the last kernel messages after a crash?

One detail:
After some investigation, I discovered that the iTCO_wdt watchdog did not work on this mini PC. I have another machine in the same situation.
AFAIK, iTCO_wdt works on all my other (old) machines.
If I understand correctly, iTCO_wdt is provided by the chipset and the motherboard manufacturer has to wire it correctly
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Mon Oct 28, 2024 12:37 pm    Post subject: Reply with quote

I am pretty sure now that it only freezes when I am playing Civ6 through Steam. Maybe this is related to Proton (Steam version of Wine) and not the Intel GPU driver.
I could not make any progress to debug this :-(
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Mon Oct 28, 2024 2:38 pm    Post subject: Reply with quote

I think I cannot help here any further ... sorry (I have no experience with steam games) ... :(
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Mon Oct 28, 2024 3:08 pm    Post subject: Reply with quote

pietinger wrote:
I think I cannot help here any further ... sorry (I have no experience with steam games) ... :(


if the cause is the GPU driver, it must be some rarely used 3D function. I would be surprised if it were only used by Proton.
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Sat Dec 14, 2024 11:24 pm    Post subject: Reply with quote

The problem still exists in 6.11.x and 6.12.x kernels.
So I recompiled a 6.12.4 with CONFIG_DRM_I915_DEBUG and I got this when Civ6 froze:

Code:
[Sun Dec 15 00:06:43 2024] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[Sun Dec 15 00:06:43 2024] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:849ffefc, in Civ6 (WinID 2) [181554]
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5c6!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5ca!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5c8!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5cc!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5ce!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5d0!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5d2!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5d4!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5d6!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5d8!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5da!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5dc!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5de!
[Sun Dec 15 00:06:54 2024] Fence expiration time out i915-0000:00:02.0:Civ6 (WinID 2)[181554]:1ac5e0!
[Sun Dec 15 00:06:57 2024] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:849ffefc, in Civ6 (WinID 2) [181554]
[Sun Dec 15 00:06:57 2024] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
[Sun Dec 15 00:06:57 2024] i915 0000:00:02.0: [drm] GT0: Resetting chip for stopped heartbeat on rcs0
[Sun Dec 15 00:06:58 2024] usb 1-1.2: USB disconnect, device number 51
[Sun Dec 15 00:06:58 2024] i915 0000:00:02.0: [drm] *ERROR* GT0: Failed to reset chip
[Sun Dec 15 00:06:58 2024] i915 0000:00:02.0: [drm] CI tainted: 0x9 by intel_gt_reset+0x31f/0x350 [i915]
[Sun Dec 15 00:06:58 2024] i915 0000:00:02.0: [drm] Civ6 (WinID 2)[181554] context reset due to GPU hang


The GUI froze. I connected from another PC. I could get this dmesg output and then the machine entirely froze.
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 447
Location: Nijmegen

PostPosted: Sun Dec 15, 2024 9:19 am    Post subject: Reply with quote

Bugs such as this one you can best report directly to the i915 driver upstream.
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5219
Location: Bavaria

PostPosted: Sun Dec 15, 2024 11:30 am    Post subject: Reply with quote

vm666 wrote:
The problem still exists in 6.11.x and 6.12.x kernels.
So I recompiled a 6.12.4 with CONFIG_DRM_I915_DEBUG and I got this when Civ6 froze:

Code:
[Sun Dec 15 00:06:43 2024] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[Sun Dec 15 00:06:43 2024] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:849ffefc, in Civ6 (WinID 2) [181554]
...
[Sun Dec 15 00:06:57 2024] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:849ffefc, in Civ6 (WinID 2) [181554]
[Sun Dec 15 00:06:57 2024] i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
[Sun Dec 15 00:06:57 2024] i915 0000:00:02.0: [drm] GT0: Resetting chip for stopped heartbeat on rcs0
[Sun Dec 15 00:06:58 2024] usb 1-1.2: USB disconnect, device number 51
[Sun Dec 15 00:06:58 2024] i915 0000:00:02.0: [drm] *ERROR* GT0: Failed to reset chip
[Sun Dec 15 00:06:58 2024] i915 0000:00:02.0: [drm] CI tainted: 0x9 by intel_gt_reset+0x31f/0x350 [i915]
[Sun Dec 15 00:06:58 2024] i915 0000:00:02.0: [drm] Civ6 (WinID 2)[181554] context reset due to GPU hang

Hmmm ... I had the same problem with older kernel versions with my browser running in full screen and playing a 4k movie from youtube ... at that time these kernel command line parameter helped me:
Code:
i915.enable_guc=2 i915.enable_psr=0

Might be worth a try?
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
vm666
n00b
n00b


Joined: 24 Oct 2003
Posts: 69

PostPosted: Sun Dec 15, 2024 4:29 pm    Post subject: Reply with quote

Nowa wrote:
Bugs such as this one you can best report directly to the i915 driver upstream.


According to the bug tracker https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/, this has been reported many times, to no avail :(


I'll try setting the GPU frequency to the maximum first, then retry with a vanilla kernel, and then report it again if this nothing works:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/4858


EDIT: froze again with gentoo-6.12.4-1
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 447
Location: Nijmegen

PostPosted: Sun Dec 15, 2024 7:41 pm    Post subject: Reply with quote

vm666 wrote:
Nowa wrote:
Bugs such as this one you can best report directly to the i915 driver upstream.


According to the bug tracker https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/, this has been reported many times, to no avail :(


Not every GPU hang is the same issue, the hang is just a symptom of some bug triggered by some application. I do not see a bug report for Civ6 yet, so likely it is a new issue. Note also that your log exposes that there is a second issue, normally the system should be able to somewhat recover from a GPU hang, but on your system we see "Failed to reset chip" which indicates that the system was somehow not able to recover from the hang and get the GPU back in a working state.
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum