View previous topic :: View next topic |
Author |
Message |
dfelicia Apprentice
Joined: 11 May 2005 Posts: 281 Location: Southwestern Connecticut
|
Posted: Thu Dec 31, 2009 1:00 am Post subject: kernel: BUG: soft lockup |
|
|
OK, I don't have a lot of info, here, but wonder if what little I have might make sense to someone.
I'm getting soft kernel lockups every so often. Started happening after one of the more recent kernel updates. Just updated to latest gentoo-sources, and it's still happening.
Screen shot of what little I had in DRAC's console buffer: http://www.donsbox.com/~dfelicia/snap1.jpg |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23064
|
Posted: Thu Dec 31, 2009 2:02 am Post subject: |
|
|
What was the last good kernel? On what versions have you seen the problem? Do you use any out of tree modules? Do you use any proprietary modules, such as the nVidia or ATI closed source graphics drivers? If you have any out of tree modules, what versions of those modules are you using? Have you noticed any pattern between your usage of the system and the occurrence of the soft lockup? |
|
Back to top |
|
|
dfelicia Apprentice
Joined: 11 May 2005 Posts: 281 Location: Southwestern Connecticut
|
Posted: Thu Dec 31, 2009 2:45 pm Post subject: |
|
|
Last good kernel was either 2.6.29 or 2.6.28, I can't recall, and pruned the old sources.
Problem is there on 2.6.30 and 2.6.31.
No out of tree modules in use, and no proprietary ones, either.
Machine is a file-server, and gets a lot of NFS, FTP, and CIFS traffic. AFAIK it's just serving files when the problem occurs. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23064
|
Posted: Thu Dec 31, 2009 4:34 pm Post subject: |
|
|
Can you try 2.6.32? What network card and driver does it use? Is there any correlation between amount of data in/out and the timing of the soft lockup? |
|
Back to top |
|
|
dfelicia Apprentice
Joined: 11 May 2005 Posts: 281 Location: Southwestern Connecticut
|
Posted: Thu Dec 31, 2009 4:51 pm Post subject: |
|
|
Driver is e1000. Found that the lockup actually was logged by syslog. When this occurred the 2 processes on the CPU were smbd (Samba) and sshd.
smbd process :
Dec 30 18:59:22 ogre kernel: Call Trace:
Dec 30 18:59:22 ogre kernel: [<c12ab955>] ? sk_stream_alloc_skb+0x2c/0xc5
Dec 30 18:59:22 ogre kernel: [<c12abccc>] ? tcp_sendmsg+0x2de/0xa33
Dec 30 18:59:22 ogre kernel: [<c1272c99>] ? sock_aio_write+0xff/0x116
Dec 30 18:59:22 ogre kernel: [<c10b1493>] ? do_sync_write+0xd2/0x10e
Dec 30 18:59:22 ogre kernel: [<c10dfc09>] ? __posix_lock_file+0x83/0x5c1
Dec 30 18:59:22 ogre kernel: [<c1042f3b>] ? autoremove_wake_function+0x0/0x37
Dec 30 18:59:22 ogre kernel: [<c10e025d>] ? do_lock_file_wait+0x3b/0xbe
Dec 30 19:15:23 ogre kernel: [<c1155b21>] ? security_task_setgroups+0xc/0xd
Dec 30 19:15:23 ogre kernel: [<c1048e9e>] ? set_groups+0x16/0x1b0
Dec 30 19:15:23 ogre kernel: [<c1155993>] ? security_file_permission+0xc/0xd
Dec 30 19:15:23 ogre kernel: [<c10b162b>] ? rw_verify_area+0x4e/0xaf
Dec 30 19:15:23 ogre kernel: [<c10b2001>] ? vfs_write+0x15f/0x166
Dec 30 19:15:23 ogre kernel: [<c1155a71>] ? security_prepare_creds+0xd/0xf
Dec 30 19:15:23 ogre kernel: [<c10b20b7>] ? sys_write+0x41/0x70
Dec 30 19:15:23 ogre kernel: [<c1002bc4>] ? sysenter_do_call+0x12/0x22
It looks like it's a race condition (?) since these are nearly the same calls on the other CPU (which is running sshd):
Dec 30 19:33:14 ogre kernel: [<c1042f3b>] ? autoremove_wake_function+0x0/0x37
Dec 30 19:33:14 ogre kernel: [<c12abfe8>] ? tcp_sendmsg+0x5fa/0xa33
Dec 30 19:33:14 ogre kernel: [<c1179fea>] ? _atomic_dec_and_lock+0x42/0x5c
Dec 30 19:33:14 ogre kernel: [<c1272c99>] ? sock_aio_write+0xff/0x116
Dec 30 19:33:14 ogre kernel: [<c10b1493>] ? do_sync_write+0xd2/0x10e
Dec 30 19:33:14 ogre kernel: [<c1042f3b>] ? autoremove_wake_function+0x0/0x37
Dec 30 19:33:14 ogre kernel: [<c1046317>] ? hrtimer_start+0x20/0x25
Dec 30 19:33:14 ogre kernel: [<c1155993>] ? security_file_permission+0xc/0xd
Dec 30 19:33:14 ogre kernel: [<c10b162b>] ? rw_verify_area+0x4e/0xaf
Dec 30 19:33:14 ogre kernel: [<c1032f4a>] ? do_setitimer+0x300/0x302
Dec 30 19:33:14 ogre kernel: [<c10b2001>] ? vfs_write+0x15f/0x166
Dec 30 19:33:14 ogre kernel: [<c10b20b7>] ? sys_write+0x41/0x70
Dec 30 19:33:14 ogre kernel: [<c1002bc4>] ? sysenter_do_call+0x12/0x22
I should note that "other cpu" is not a physical core. This is an older Xeon CPU with hyperthreading turned on.
I'm compiling hardened-sources, now, which is back on 2.6.28. |
|
Back to top |
|
|
|