Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Irregular complete system freeze
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Chopstix
n00b
n00b


Joined: 20 Oct 2012
Posts: 20

PostPosted: Sun Nov 10, 2013 4:31 pm    Post subject: Irregular complete system freeze Reply with quote

I've been experiencing this issue probably ever since I had this high end laptop, at least for the last two years.

Every now and then, especially when using a program or game that uses the GPU, Xonotic (game) or Autopano Giga (program like Hugin for stitching panoramas that uses the GPU for preview and rendering), the image on screen will freeze along with the audio. Either nothing will change and it remains frozen (reboot via Magic SysRq key works), or if I Alt+Tab, sometimes (but usually not) after a few seconds the system will return to life, but then usually after some 20 seconds the system freezes completely, so I use these few seconds to cleanly reboot. Other times, instead of the image on screen freezing, the whole screen will turn grey and flicker or flash in an alarming way, reminiscent of an epileptic seizure. This is accompanied by two lights above my keyboard flashing (They have padlock signs, no idea what they mean, I will try to get a picture of them next time it happens). Magic SysRq keys don't work. Only option is to power off.

Though rarely, this has happened when I was just doing something in KDE4, though I guess it was using the GPU for compositing. Happens very often when I use 3D programs/games. I could say the system has a half-life of an hour - half the time this will happen within the hour.

My GPU is an nVidia GTX 285M using proprietary nvidia-drivers, currently 331.20 but this has been happening since 2xx.

Yesterday I tried using my tablet to SSH into my machine and monitor some /var/log files while playing, hoping to see some relevant messages appear. Very unusually, I must have played for 2 hours and nothing happened.
Tried again today without the tablet, happened after half an hour. I got these in /var/log/messages:
Code:
17462 Nov 10 13:40:01 overkill cron[22043]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons)
17463 Nov 10 13:46:14 overkill kernel: [10973.436884] perf samples too long (5105 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
17464 Nov 10 13:50:01 overkill cron[22065]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons)
17465 Nov 10 13:51:13 overkill kernel: [11272.575627] [sched_delayed] sched: RT throttling activated
17466 Nov 10 13:51:19 overkill kernel: [11278.583775] ehci-pci 0000:00:1d.0: iso underrun ffff8800c07f9a00 (7316+56 < 14617)
17467 Nov 10 13:51:19 overkill kernel: [11278.583782] delay: estimated 384, actual 0
17468 Nov 10 13:51:19 overkill kernel: [11278.583789] ehci-pci 0000:00:1d.0: iso underrun ffff8800c07fa000 (7380+56 < 14618)
17469 Nov 10 13:51:19 overkill kernel: [11278.583793] delay: estimated 384, actual 96
(...)
17516 Nov 10 13:51:19 overkill kernel: [11278.584055] ehci-pci 0000:00:1d.0: iso underrun ffff8800c07fa000 (724+56 < 6428)
17517 Nov 10 13:51:19 overkill kernel: [11278.584059] delay: estimated 480, actual 96
17518 Nov 10 13:51:19 overkill kernel: [11278.584066] ehci-pci 0000:00:1d.0: iso underrun ffff8800c07f9400 (788+56 < 6428)
17519 Nov 10 13:51:19 overkill kernel: [11278.584071] delay: estimated 480, actual 96

I found this in Xorg.log.old, I'm not sure whether these appeared during the freeze because there are no timestamps, but I'll provide them just in case:
Code:
[ 11265.627] (WW) NVIDIA(0): WAIT (2, 4, 0x8000, 0x0000d6ac, 0x00001580)
[ 11273.663] (WW) NVIDIA(0): WAIT (1, 4, 0x8000, 0x0000d6ac, 0x00001580)
[ 11276.664] (WW) NVIDIA(0): WAIT (2, 4, 0x8000, 0x0000d6ac, 0x000026ac)
[ 11281.667] (WW) NVIDIA(0): WAIT (0, 4, 0x0000, 0x000026ac, 0x000026ac)


I keep my Gentoo machine up to date, running mostly amd64 stable, just some ~amd64 packages for a few userland programs like Gimp. This has been a constant problem for at least two years, back in the gentoo-sources-2.* days.

lspci -v
lsusb -v
emerge --info

Please help.


Last edited by Chopstix on Sun Nov 10, 2013 5:58 pm; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23062

PostPosted: Sun Nov 10, 2013 5:53 pm    Post subject: Reply with quote

Please post the output of emerge --info.

The flashing LEDs usually indicate a kernel panic.

I see you are using an nVidia card. Are you using the open nVidia driver or the proprietary nVidia driver? If the latter, please try to reproduce the problem with an untainted kernel.
Back to top
View user's profile Send private message
Chopstix
n00b
n00b


Joined: 20 Oct 2012
Posts: 20

PostPosted: Sun Nov 10, 2013 6:00 pm    Post subject: Reply with quote

emerge --info link added to post, along with GPU info.
I use nvidia-drivers, currently 331.20.
What does "untainted kernel" mean? If you mean nouveau, I can't test that as I get 2fps in Xonotic and had some compatibility issues in Autopano Giga too.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23062

PostPosted: Sun Nov 10, 2013 9:42 pm    Post subject: Reply with quote

If you load a proprietary kernel module, your kernel is tainted. Proprietary modules have a history of causing weird problems and are difficult to support, so upstream generally does not support a kernel if the problem manifests only when the kernel is tainted. If you cannot reproduce the problem with an untainted kernel, you will probably need to seek support from the provider of the proprietary module, nVidia.
Back to top
View user's profile Send private message
shazeal
Apprentice
Apprentice


Joined: 03 May 2006
Posts: 208
Location: New Zealand

PostPosted: Mon Nov 11, 2013 3:56 am    Post subject: Reply with quote

Random errors generally indicate a hardware problem not software. GPU could be overheating or Memory on the card is bad. It could be some other component of the card though. I have had several cards which exhibited similar behaviour, exchanging for a different card fixed it every time.
The easiest way to test this out is to boot the card into a different OS like Windows, if you can crash it there too its definitely hardware. Otherwise monitor temps, if the Heatsink was never seated correctly/paste is dried out it could cause something like this.

This was a popular card and is mature, so if google does not come up with anything its generally your card at fault.
_________________
CFLAGS="-OmgWTFR1CE --fun-lol-loops --march=asmx86go"
Back to top
View user's profile Send private message
Chopstix
n00b
n00b


Joined: 20 Oct 2012
Posts: 20

PostPosted: Mon Nov 11, 2013 2:22 pm    Post subject: Reply with quote

I was worried that could be the case.

What about this error message, what does it mean?
ehci-pci 0000:00:1d.0: iso underrun
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9883
Location: almost Mile High in the USA

PostPosted: Mon Nov 18, 2013 8:05 pm    Post subject: Reply with quote

It almost seems that video cards only last like a few years and guaranteed to die after that...
Seems all of my discrete GPUs so far have developed problems after about 2 years of use. The onboard ones tend to survive, and the on-cpu ones I don't have enough data yet...

USB iso underrun likely is that you have an isochronous stream on a usb device but it didn't get enough data for whatever reason, might be a timing problem caused by another device...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum