Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
kernel: BUG: soft lockup
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
dfelicia
Apprentice
Apprentice


Joined: 11 May 2005
Posts: 281
Location: Southwestern Connecticut

PostPosted: Thu Dec 31, 2009 1:00 am    Post subject: kernel: BUG: soft lockup Reply with quote

OK, I don't have a lot of info, here, but wonder if what little I have might make sense to someone.

I'm getting soft kernel lockups every so often. Started happening after one of the more recent kernel updates. Just updated to latest gentoo-sources, and it's still happening.

Screen shot of what little I had in DRAC's console buffer: http://www.donsbox.com/~dfelicia/snap1.jpg
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23064

PostPosted: Thu Dec 31, 2009 2:02 am    Post subject: Reply with quote

What was the last good kernel? On what versions have you seen the problem? Do you use any out of tree modules? Do you use any proprietary modules, such as the nVidia or ATI closed source graphics drivers? If you have any out of tree modules, what versions of those modules are you using? Have you noticed any pattern between your usage of the system and the occurrence of the soft lockup?
Back to top
View user's profile Send private message
dfelicia
Apprentice
Apprentice


Joined: 11 May 2005
Posts: 281
Location: Southwestern Connecticut

PostPosted: Thu Dec 31, 2009 2:45 pm    Post subject: Reply with quote

Last good kernel was either 2.6.29 or 2.6.28, I can't recall, and pruned the old sources.

Problem is there on 2.6.30 and 2.6.31.

No out of tree modules in use, and no proprietary ones, either.

Machine is a file-server, and gets a lot of NFS, FTP, and CIFS traffic. AFAIK it's just serving files when the problem occurs.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23064

PostPosted: Thu Dec 31, 2009 4:34 pm    Post subject: Reply with quote

Can you try 2.6.32? What network card and driver does it use? Is there any correlation between amount of data in/out and the timing of the soft lockup?
Back to top
View user's profile Send private message
dfelicia
Apprentice
Apprentice


Joined: 11 May 2005
Posts: 281
Location: Southwestern Connecticut

PostPosted: Thu Dec 31, 2009 4:51 pm    Post subject: Reply with quote

Driver is e1000. Found that the lockup actually was logged by syslog. When this occurred the 2 processes on the CPU were smbd (Samba) and sshd.

smbd process :
Dec 30 18:59:22 ogre kernel: Call Trace:
Dec 30 18:59:22 ogre kernel: [<c12ab955>] ? sk_stream_alloc_skb+0x2c/0xc5
Dec 30 18:59:22 ogre kernel: [<c12abccc>] ? tcp_sendmsg+0x2de/0xa33
Dec 30 18:59:22 ogre kernel: [<c1272c99>] ? sock_aio_write+0xff/0x116
Dec 30 18:59:22 ogre kernel: [<c10b1493>] ? do_sync_write+0xd2/0x10e
Dec 30 18:59:22 ogre kernel: [<c10dfc09>] ? __posix_lock_file+0x83/0x5c1
Dec 30 18:59:22 ogre kernel: [<c1042f3b>] ? autoremove_wake_function+0x0/0x37
Dec 30 18:59:22 ogre kernel: [<c10e025d>] ? do_lock_file_wait+0x3b/0xbe
Dec 30 19:15:23 ogre kernel: [<c1155b21>] ? security_task_setgroups+0xc/0xd
Dec 30 19:15:23 ogre kernel: [<c1048e9e>] ? set_groups+0x16/0x1b0
Dec 30 19:15:23 ogre kernel: [<c1155993>] ? security_file_permission+0xc/0xd
Dec 30 19:15:23 ogre kernel: [<c10b162b>] ? rw_verify_area+0x4e/0xaf
Dec 30 19:15:23 ogre kernel: [<c10b2001>] ? vfs_write+0x15f/0x166
Dec 30 19:15:23 ogre kernel: [<c1155a71>] ? security_prepare_creds+0xd/0xf
Dec 30 19:15:23 ogre kernel: [<c10b20b7>] ? sys_write+0x41/0x70
Dec 30 19:15:23 ogre kernel: [<c1002bc4>] ? sysenter_do_call+0x12/0x22


It looks like it's a race condition (?) since these are nearly the same calls on the other CPU (which is running sshd):

Dec 30 19:33:14 ogre kernel: [<c1042f3b>] ? autoremove_wake_function+0x0/0x37
Dec 30 19:33:14 ogre kernel: [<c12abfe8>] ? tcp_sendmsg+0x5fa/0xa33
Dec 30 19:33:14 ogre kernel: [<c1179fea>] ? _atomic_dec_and_lock+0x42/0x5c
Dec 30 19:33:14 ogre kernel: [<c1272c99>] ? sock_aio_write+0xff/0x116
Dec 30 19:33:14 ogre kernel: [<c10b1493>] ? do_sync_write+0xd2/0x10e
Dec 30 19:33:14 ogre kernel: [<c1042f3b>] ? autoremove_wake_function+0x0/0x37
Dec 30 19:33:14 ogre kernel: [<c1046317>] ? hrtimer_start+0x20/0x25
Dec 30 19:33:14 ogre kernel: [<c1155993>] ? security_file_permission+0xc/0xd
Dec 30 19:33:14 ogre kernel: [<c10b162b>] ? rw_verify_area+0x4e/0xaf
Dec 30 19:33:14 ogre kernel: [<c1032f4a>] ? do_setitimer+0x300/0x302
Dec 30 19:33:14 ogre kernel: [<c10b2001>] ? vfs_write+0x15f/0x166
Dec 30 19:33:14 ogre kernel: [<c10b20b7>] ? sys_write+0x41/0x70
Dec 30 19:33:14 ogre kernel: [<c1002bc4>] ? sysenter_do_call+0x12/0x22

I should note that "other cpu" is not a physical core. This is an older Xeon CPU with hyperthreading turned on.

I'm compiling hardened-sources, now, which is back on 2.6.28.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum