Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] X freezes, NVIDIA(GPU-0): WAIT
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 429
Location: Nijmegen

PostPosted: Thu Feb 07, 2019 5:51 pm    Post subject: [SOLVED] X freezes, NVIDIA(GPU-0): WAIT Reply with quote

X has been freezing a lot lately, almost everyday at least once. The screen just freezes completely, the cursor can move but the system does not respond to keyboard input (apart from SysRq, I can force it to reboot, but it won't switch tty even after SysRq+r).
Also the system does not seem to recover, because simply waiting doesn't seem to help.
Xorg.log shows me:
Code:

[  6270.576] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x0000fc50, 0x0000fc6c)
[  6277.576] (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x0000fc50, 0x0000fc6c)

(Full xorg.log.old: https://paste.pound-python.org/show/mngWuMPLALEJ0XoYmEer/)
dmesg or syslog don't seem to report any thing out of the ordinary as far as I can tell, today's syslog: https://paste.pound-python.org/show/cBhNiWAMSQwcau1XIQ77/

Anyone got any idea what's going on here?
Is this maybe a driver bug, or is my GPU dying?

(Emerge --info: https://paste.pound-python.org/show/3cxt0UERiLWimv3QqYYS/)
x11-drivers/nvidia-drivers-418.30
x11-base/xorg-server-1.20.3
media-libs/mesa-19.0.0_rc2
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400


Last edited by Nowa on Sat Feb 16, 2019 1:06 pm; edited 1 time in total
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3417
Location: Canada

PostPosted: Thu Feb 07, 2019 7:55 pm    Post subject: Reply with quote

It does look like hardware/driver hang.

Can you ssh into the machine when it is hanged and reload nividia module ? (rmmod nvidia, modprobe nvidia)
Back to top
View user's profile Send private message
Frautoincnam
Guru
Guru


Joined: 19 May 2017
Posts: 324

PostPosted: Fri Feb 08, 2019 5:40 am    Post subject: Reply with quote

I have a similar problem with a very similar machine, but I have no mouse cursor. X is totally dead.
This problem is recent (about 10 days). All worked nice with the same configuration during about 1 month.
I have nothing at all in logs.
I have 3 monitors : 2 (displayport & DVI) on Nvidia card, 1 (HDMI) on integrated CPU

I found that X totally freezes when (and only when, not each time but very often) I quit WoW (World of Warcraft started by wine).

I can ssh to the machine, but can't kill Wow.exe, nor X.
I always have to reboot.

BUT, if I "chvt 1" before leaving WoW, kill Wow.exe, then chvt 7, all works fine.

I've upgraded from stable version in portage to :
- app-emulation/wine-staging-4.0
- sys-kernel/gentoo-sources-4.19.19
- x11-drivers/nvidia-drivers-418.30
without any change.
_________________
OS: Gentoo KDE x86_64 (4.19.97-gentoo)
MB: Z370 GAMING PLUS (MS-7B61)
CPU: Coffeelake Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
GPU: NVIDIA MSI GeForce GTX 1050 TI 4GT LP & i915 CPU Integrated
SSD: Samsung SSD 860 250GB
RAM: G.Skill 32GB DDR4-2133
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 429
Location: Nijmegen

PostPosted: Fri Feb 08, 2019 9:12 am    Post subject: Reply with quote

dmpogo wrote:
It does look like hardware/driver hang.

Can you ssh into the machine when it is hanged and reload nividia module ? (rmmod nvidia, modprobe nvidia)


I have added sshd to the default run level and written down my local ip address, next time it happens I should be able to ssh into my pc using my phone.

Frautoincnam wrote:
I have a similar problem with a very similar machine, but I have no mouse cursor. X is totally dead.
This problem is recent (about 10 days). All worked nice with the same configuration during about 1 month.
I have nothing at all in logs.
I have 3 monitors : 2 (displayport & DVI) on Nvidia card, 1 (HDMI) on integrated CPU

I found that X totally freezes when (and only when, not each time but very often) I quit WoW (World of Warcraft started by wine).

I can ssh to the machine, but can't kill Wow.exe, nor X.
I always have to reboot.

BUT, if I "chvt 1" before leaving WoW, kill Wow.exe, then chvt 7, all works fine.

I've upgraded from stable version in portage to :
- app-emulation/wine-staging-4.0
- sys-kernel/gentoo-sources-4.19.19
- x11-drivers/nvidia-drivers-418.30
without any change.


Yeah this is very recent for me too, it worked fine for months before. I also have 3 monitors (1 DVI-D on Nvidia GPU, 1 VGA and 1 DVI-D on intel GPU). I usually can still use the cursor, but sometimes the cursor freezes as well after sometime.
However for me this seems to be completely random, I have had this happen while still in SDDM, and I have also had this happen while the system has been running for ~30 minutes, doing basically nothing.

The GPU is maybe 9 months old, it shouldn't be having any problems already. I have opened up the pc up and confirmed that the GPU is not somehow loose in its PCIe slot.
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 429
Location: Nijmegen

PostPosted: Mon Feb 11, 2019 12:29 pm    Post subject: Reply with quote

Today it happened again, while I was closing a full-screen video.

I ssh'd into it, and tried to rmmod nvidia, however nividia depends on nvidia_modeset which depends on nvidia_drm.
Rmmod nvidia_drm failed because 'module was in use', however when I forced it with the -f option it didn't complain. Nothing changed though, so I rmmod -f nivida_modeset as well, this caused the monitor connected to the nvidia gpu to turn black, however the other 2 monitors remained unresponsive (though I could still move the cursor). Next I did rmmod -f nvidia, this command refused to complete (it did not return the command prompt). In a new ssh session I tried to chvt 1 as Frautoincnam suggested, however that refused to complete as well. So instead I opened another ssh session and did shutdown -h now, however the system did not shutdown. The two remaining monitors kept their content, however the system stopped to respond to ssh at all, so it must have started the shutdown process anyway and froze somewhere in the middle.

The rc.log shows me that:
Code:
* Stopping sddm ...
* start-stop-daemon: 1 process refused to stop
* Error stopping sddm
[ !! ]
* ERROR: xdm failed to stop

And a little later:
Code:
* Unmounting loop devices
* Unmounting filesystems
*   Unmounting /run/user/1000 ...
*   in use but fuser finds nothing
[ !! ]
*   Unmounting /var/tmp/portage ...
[ ok ]
*   Unmounting /tmp ...
*   in use but fuser finds nothing
[ !! ]
*   Unmounting /home/andrew/Storage ...
[ ok ]
*   Unmounting /boot ...
[ ok ]
(/tmp is mounted as tmpfs)
The log continues until after it stops udev, it stops there, but it always does that.

Sddm.log shows:
Code:
[10:04:03.092] (WW) DAEMON: Signal received: SIGTERM
[10:04:03.092] (II) DAEMON: Socket server stopping...
[10:04:03.092] (II) DAEMON: Socket server stopped.
[10:04:03.092] (II) DAEMON: Display server stopping...
[10:04:08.097] (WW) DAEMON: QProcess: Destroyed while process ("/usr/libexec/sddm-helper") is still running.
[10:04:08.098] (II) DAEMON: Display server stopping...
[10:04:13.103] (WW) DAEMON: QProcess: Destroyed while process ("/usr/bin/X") is still running.

After which the log stops, so sddm stops but it fails to kill X.

syslog is where it gets interesting, here are some lines:
Code:
Feb 11 09:56:10 andrew-gentoo-pc kernel: [ 2151.026289] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Code:
Feb 11 09:56:20 andrew-gentoo-pc kernel: [ 2161.122295] WARNING: CPU: 11 PID: 5556 at /var/tmp/portage/x11-drivers/nvidia-drivers-418.30/work/kernel/nvidia/nv-rsync.c:44 nv_destroy_rsync_info+0x25/0x30 [nvidia]


I'm not an expert here, but would I be correct in assuming that this means that it fails to stop the CPU because the nvidia-drivers process is still running?
Full output of grep "Feb 11 09" /var/log/syslog -a : https://paste.pound-python.org/show/iWv1yZjShTSXp8nqHWON/

I also see this in the log:
Code:
Feb 11 09:56:34 andrew-gentoo-pc pulseaudio[3875]: [pulseaudio] module-loopback.c: Too many underruns, increasing latency to 12.00 ms
Feb 11 09:56:44 andrew-gentoo-pc pulseaudio[3875]: [pulseaudio] module-loopback.c: Too many underruns, increasing latency to 17.00 ms


I have been autoloading the loopback module to forward the MB's Line-In input (which is connected to my TV-receiver). If I remember correctly I did this at approximately the same time as when these issues started to appear, maybe nvidia-drivers does not like the loopback module? Shouldn't pulseaudio have been killed already?

Also. maybe I should add that I am using the boot parameter "nvidia-drm.modeset=1" because nvidia claims that this eliminates/reduces tearing (and enables frame synchronization between the GPU's I think), however I still see tearing sometimes, but only on the monitor connect to the nvidia gpu.

I also have the following in /usr/share/sddm/scripts/Xsetup:
Code:
xrandr --setprovideroutputsource modesetting NVIDIA-0
xrandr --auto --output HDMI-1-2 --mode 1600x900 --pos 3360x90 --output DVI-D-0 --mode 1920x1080 --pos 1440x0 --output DP-1-2 --mode 1440x900 --pos 0x90

The first line enables PRIME, nvidia's documentation has "xrandr --auto" as the second line.
However, when I use the --auto option KDE completely messes up the monitor configuration (even though sddm detects it just fine), all monitors are put over each other in a configuration that resembles the duplication configuration but is not quite the same because the resolutions do match. It used to work fine with 2 monitors, however ever since I added the third I need to manually specify the correct configuration.

I do not have efifb enabled because when I enable it, I get a low resolution framebuffer on the monitor connected to the nvidia GPU, and no framebuffer on the monitors connected to the intel GPU.
Instead I have it disabled which gives me no framebuffer on the monitor connected to the nvidia GPU, but it does give me a high-resolution framebuffer on the monitors connected to the intel GPU.

I have also had problems with the HDMI output of the nivida GPU (see my other thread here: [SOLVED]Problems with nvidia-drivers and 2nd monitor on IGPU)
The monitor would slowly become completely white when: I logged in from sddm, I switched from tty to X, or whenever the monitor configuration changed.
I have not had this problem ever since I have been using the DVI-D output instead, indicating that this was not a problem with the monitor, but with the GPU.

How can I discern between a driver issue and a hardware problem?
Would I be able to get my money back on the nvidia card, and just buy a AMD one instead?

I used to have AMD in my old laptop however it melted/burned-through (idk point is it's broken), and when I bought this pc I remembered how much fglrx sucked so I bough nvidia instead. Maybe this was a mistake :/.
I hear AMD now has (semi-)open source drivers, so I would expect it to be less problematic then the old fglrx drivers. (it certainly can't be worse then this :/ )


[EDIT]
It happened again :/
This time I was able to successfully do shutdown -r now, probably because I did not mess with the nvidia modules first this time.
Log still showed that it failed to stop sddm and unmount /run/user/1000.
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
Frautoincnam
Guru
Guru


Joined: 19 May 2017
Posts: 324

PostPosted: Mon Feb 11, 2019 5:22 pm    Post subject: Reply with quote

AndrewAmmerlaan wrote:
Today it happened again, while I was closing a full-screen video.

I happened to me too, while NOT leaving WoW, and not doing something special, just opening a new tab in firefox.
_________________
OS: Gentoo KDE x86_64 (4.19.97-gentoo)
MB: Z370 GAMING PLUS (MS-7B61)
CPU: Coffeelake Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
GPU: NVIDIA MSI GeForce GTX 1050 TI 4GT LP & i915 CPU Integrated
SSD: Samsung SSD 860 250GB
RAM: G.Skill 32GB DDR4-2133
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 429
Location: Nijmegen

PostPosted: Mon Feb 11, 2019 6:00 pm    Post subject: Reply with quote

Frautoincnam wrote:
AndrewAmmerlaan wrote:
Today it happened again, while I was closing a full-screen video.

I happened to me too, while NOT leaving WoW, and not doing something special, just opening a new tab in firefox.


The fact that this happens to multiple users makes me at least confident that it is not a hardware issue, I have posted this on the Nvidia forums as well.
Also because gentoo configurations are usually very different from each other this is probably a driver bug then. (You are running a different kernel version, and it used to work, so not a kernel bug I think)
I did upgrade nvidia-drivers last week, however this happened before this upgrade as well, so if it is due to the driver then it affected the previous version as well (I can't seem to find anything in the changelogs that might be related though)
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
Frautoincnam
Guru
Guru


Joined: 19 May 2017
Posts: 324

PostPosted: Mon Feb 11, 2019 6:32 pm    Post subject: Reply with quote

But it could be du to i915 too, which is in the kernel.

EDIT :
No, you're surely right, it's a nvidia module bug.
_________________
OS: Gentoo KDE x86_64 (4.19.97-gentoo)
MB: Z370 GAMING PLUS (MS-7B61)
CPU: Coffeelake Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
GPU: NVIDIA MSI GeForce GTX 1050 TI 4GT LP & i915 CPU Integrated
SSD: Samsung SSD 860 250GB
RAM: G.Skill 32GB DDR4-2133
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 429
Location: Nijmegen

PostPosted: Mon Feb 11, 2019 9:09 pm    Post subject: Reply with quote

As generix from the nvidia forums suggested I have switched the monitors around. The middle one (DVI-D) is now connected to intel instead of nvidia and the right one (DVI-D) is connected to nvidia instead of intel..
I am now testing this setup (so far so good, but who knows)

Also idk if you've been getting this annoying OSD popup asking you to select monitor configuration at every boot as well. But if you did, you can get rid of it by disabling the kscreen2 service in KDE settings.
Quote:
BTW, since you're using KDE and are setting the monitor placement using xrandr, it's often better to disable the kscreen2 service because this has strange side-effects.

_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
Nowa
Developer
Developer


Joined: 25 Jun 2014
Posts: 429
Location: Nijmegen

PostPosted: Sat Feb 16, 2019 1:05 pm    Post subject: Reply with quote

X hasn't crashed for nearly four days, I guess the changes I made to the monitor configuration fixed this. I found this solution on the nvidia forums, I'll quote it here for future readers:
Quote:

The most recent crashes included XID 62 and 56, something was failing in the display engine. Taking the odd problem with your monitor into account, this might as well be an electrical problem, the monitor kind of overloading the output circuitry. Please disconnect the monitor you currently have connected to the nvidia gpu to test if you get a stable system. If no crashes occur, connect one of the other monitors to it.
BTW, since you're using KDE and are setting the monitor placement using xrandr, it's often better to disable the kscreen2 service because this has strange side-effects.


Frautoincnam wrote:
AndrewAmmerlaan wrote:
Today it happened again, while I was closing a full-screen video.

I happened to me too, while NOT leaving WoW, and not doing something special, just opening a new tab in firefox.

Does switching monitors work for you too?
_________________
OS: Gentoo 6.10.12-gentoo-dist, ~amd64, 23.0/desktop/plasma/systemd
MB: MSI Z370-A PRO
CPU: Intel Core i9-9900KS
GPU: Intel Arc A770 16GB & Intel UHD Graphics 630
SSD: Samsung 970 EVO Plus 2 TB
RAM: Crucial Ballistix 32GB DDR4-2400
Back to top
View user's profile Send private message
Frautoincnam
Guru
Guru


Joined: 19 May 2017
Posts: 324

PostPosted: Sat Feb 16, 2019 2:31 pm    Post subject: Reply with quote

AndrewAmmerlaan wrote:
Does switching monitors work for you too?

I can't do that, because I have only 1 monitor with DVI and the 2 others only VGA. And I need the DVI as main monitor, on my nvidia card.
But for the moment I didn't have freeze anymore. I only had one not linked with leaving wow. And now, I always "chvt 1" before leaving wow, so no freeze.
_________________
OS: Gentoo KDE x86_64 (4.19.97-gentoo)
MB: Z370 GAMING PLUS (MS-7B61)
CPU: Coffeelake Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
GPU: NVIDIA MSI GeForce GTX 1050 TI 4GT LP & i915 CPU Integrated
SSD: Samsung SSD 860 250GB
RAM: G.Skill 32GB DDR4-2133
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum