View previous topic :: View next topic |
Author |
Message |
mutiny n00b
Joined: 06 Aug 2014 Posts: 16
|
Posted: Tue Mar 31, 2015 4:52 am Post subject: How to troubleshoot hanging Gentoo system? |
|
|
Hello all.
I have been experiencing a system hang for a few days now, and have no idea how to go about diagnosing the cause or how to troubleshoot/where to even begin.
Basically, I have been waking up to a complete system hang every morning, after the system has been idle overnight. The keyboard does not respond (Num Lock LED is on, but pressing Num Lock keys or Caps Lock etc, does not produce any response, keyboard seems dead), mouse seems to not have power, and monitors are stuck off/idle state. However, I am able to use Magic Sysrq key combinations to recover keyboard input and reboot the system (REISUB works). I cannot obtain any video output regardless of commands after Alt+Sysrq+R, such as Ctrl+Alt+F1. The system also appears to go offline when in this state, as other machines cannot ping/ssh into it.
I'm not sure if this is a video driver type issue, kernel issue, etc. I have done long term memtest to test ram, as well as some intensive CPU/system tests like mprime, to attempt to rule out hardware issues. This hang only happens after some period of idle, and has never happened during active use of the system. Are there logs I can check? Additional tests I can perform? How to go about figuring out what is going on and fixing this issue?
System is ~amd64 with 3.19.3 kernel, and systemd (because using Gnome 3)
Video card is Nvidia GTX 650 with nouveau driver
CPU is Core i7-4790K
Thanks for any ideas and assistance! |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Tue Mar 31, 2015 1:54 pm Post subject: |
|
|
Try dsabling power save and see if it still does this?
If it oopsed/panicked it should be blinking the Numlock/Capslock LEDs so this is weird.
Is there any information in the journal (journalctl) that could be interesting at the estimated time of the crash? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
mutiny n00b
Joined: 06 Aug 2014 Posts: 16
|
Posted: Tue Mar 31, 2015 7:01 pm Post subject: |
|
|
I'm not sure I have any power saving features enabled, except for blanking screen after 15 mimutes in Gnome's settings.
How do I view the journal for a particular time? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Tue Mar 31, 2015 7:15 pm Post subject: |
|
|
You could do something like:
# journalctl --since "2015-03-25 00:00:00" --until "2015-03-25 01:00:00"
See if there is anything interesting, oopses, etc. Then again if it crashed and you can't sync, likely nothing will be recorded their either.
For power saving, try disable screen blanking for one test (actually, it's best to keep it disabled so you could see any problems that show up if they do), and another disable any CPU throttling - let it run at "performance".
Is this repeatable? Every idle period it will fail?
Will it fail if Ethernet is disconnected? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Tue Mar 31, 2015 7:53 pm Post subject: |
|
|
Enable pstore and efivars pstore backend in the kernel, then next time it hangs you can reboot, mount /sys/fs/pstore and get the dmesg log from when it crashed. |
|
Back to top |
|
|
mutiny n00b
Joined: 06 Aug 2014 Posts: 16
|
Posted: Wed Apr 01, 2015 7:09 pm Post subject: |
|
|
Thanks for the information.
After last nights hang, I tried something this morning. After Alt+SysRq+R and getting keyboard back, I tried Alt+SysRq+K and after a few minutes the monitors came back on, with what looked like some error lines or something on console. I wasn't able to catch or record what was there... the monitors went blank again and system "hung" again.
Checked journalctl from last night to this morning:
Code: | -- Logs begin at Mon 2015-03-30 17:45:32 HST, end at Wed 2015-04-01 09:04:16 HST. --
Apr 01 00:00:00 renoir gnome-session[964]: (evolution-alarm-notify:1127): evolution-alarm-notify-WARNING **: alarm.c:253: Reques
Apr 01 07:35:23 renoir kernel: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x0004326000 [PTE] from GR/GPC0/T1_0 on channel 0
Apr 01 07:35:23 renoir kernel: nouveau E[ PFIFO][0000:01:00.0] PGRAPH engine fault on channel 5, recovering...
Apr 01 07:35:23 renoir kernel: nouveau E[ PGRAPH][0000:01:00.0] TRAP ch 5 [0x003f955000 gnome-shell[1055]]
Apr 01 07:35:23 renoir kernel: nouveau E[ PGRAPH][0000:01:00.0] GPC0/TPC0/TEX: 0x80000049
Apr 01 07:35:23 renoir kernel: nouveau E[ PGRAPH][0000:01:00.0] GPC0/TPC1/TEX: 0x80000049
Apr 01 07:36:14 renoir synergys[969]: Synergy 1.7.0: NOTE: client "nemesis" is dead
Apr 01 07:40:36 renoir dbus[733]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.f
Apr 01 07:40:36 renoir systemd[1]: Starting Network Service...
Apr 01 07:40:36 renoir systemd[1]: Starting Network Manager Script Dispatcher Service...
Apr 01 07:40:36 renoir dbus[733]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Apr 01 07:40:36 renoir systemd[1]: Started Network Manager Script Dispatcher Service.
Apr 01 07:40:36 renoir nm-dispatcher[6900]: Dispatching action 'dhcp4-change' for eno1
Apr 01 07:40:36 renoir dhclient[930]: bound to 192.168.67.12 -- renewal in 35102 seconds.
Apr 01 07:40:36 renoir systemd-timesyncd[728]: Network configuration changed, trying to establish connection.
Apr 01 07:40:36 renoir systemd-networkd[6899]: Enumeration completed
Apr 01 07:40:36 renoir systemd[1]: Started Network Service.
Apr 01 07:40:36 renoir systemd-timesyncd[728]: Network configuration changed, trying to establish connection. |
It seems to be fairly repeatable after every long idle session, the system will very likely be hung. I'll try tonight with ethernet disconnected and monitors left on. Nouveau seems to have a problem it seems from the journal entries?
I think I may also try replacing the Nvidia card with an older Radeon card I have laying around and switching to radeon drivers, to see if it is a driver/GPU hardware issue. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|