TJNII l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/17238683193fdc0165daf7f.gif)
Joined: 09 Nov 2003 Posts: 648 Location: for(;;);
|
Posted: Wed Dec 15, 2010 3:13 am Post subject: System hang due to low stack: Prevention advice wanted |
|
|
This evening one of my servers want down. It seems it hung due to a low stack. I see the following repeated in /var/log/everything:
Code: | Dec 14 16:54:03 [kernel] low stack detected by irq handler
Dec 14 16:54:03 [kernel] Pid: 24475, comm: sh Tainted: G C 2.6.31-gentoo-r6 #1
Dec 14 16:54:03 [kernel] Call Trace:
Dec 14 16:54:03 [kernel] [<c05ff93a>] ? printk+0x18/0x1a
Dec 14 16:54:03 [kernel] [<c0105827>] print_stack_overflow+0x17/0x20
Dec 14 16:54:03 [kernel] <IRQ> [<c0104d06>] ? do_IRQ+0x46/0xc0
Dec 14 16:54:03 [kernel] [<c01034a9>] ? common_interrupt+0x29/0x30
Dec 14 16:54:03 [kernel] [<c0601d97>] ? _spin_unlock_irqrestore+0x7/0x10
Dec 14 16:54:03 [kernel] [<fcb40213>] ? RTSpinlockReleaseNoInts+0x13/0x20 [vboxdrv]
Dec 14 16:54:03 [kernel] [<c013a095>] ? lock_timer_base+0x25/0x50
Dec 14 16:54:03 [kernel] [<c013a33e>] ? mod_timer_pending+0x9e/0xd0
Dec 14 16:54:03 [kernel] [<f9800b73>] ? __nf_ct_refresh_acct+0x83/0xc0 [nf_conntrack]
Dec 14 16:54:03 [kernel] [<fcc23349>] ? vboxNetFltLinuxForwardSegment+0xa9/0xe0 [vboxnetflt]
Dec 14 16:54:03 [kernel] [<c052fbd1>] ? skb_checksum_help+0x61/0x120
Dec 14 16:54:03 [kernel] [<fcc235ea>] ? vboxNetFltLinuxPacketHandler+0x26a/0x4e0 [vboxnetflt]
Dec 14 16:54:03 [kernel] [<c014d676>] ? getnstimeofday+0x56/0x110
Dec 14 16:54:03 [kernel] [<c0525262>] ? __skb_clone+0x22/0xd0
Dec 14 16:54:03 [kernel] [<c052f926>] ? dev_hard_start_xmit+0x106/0x350
Dec 14 16:54:03 [kernel] [<c053f3e5>] ? __qdisc_run+0x1e5/0x230
Dec 14 16:54:03 [kernel] [<f9bc636f>] ? ipv4_confirm+0x5f/0x130 [nf_conntrack_ipv4]
Dec 14 16:54:03 [kernel] [<c052ffaf>] ? dev_queue_xmit+0x31f/0x460
Dec 14 16:54:03 [kernel] [<c0555188>] ? ip_finish_output+0x1f8/0x2a0
Dec 14 16:54:03 [kernel] [<c0555288>] ? ip_output+0x58/0xb0
Dec 14 16:54:03 [kernel] [<c0554f90>] ? ip_finish_output+0x0/0x2a0
Dec 14 16:54:03 [kernel] [<c0554258>] ? ip_local_out+0x18/0x20
Dec 14 16:54:03 [kernel] [<c0554a2e>] ? ip_queue_xmit+0x1ae/0x370
Dec 14 16:54:03 [kernel] [<f9800b73>] ? __nf_ct_refresh_acct+0x83/0xc0 [nf_conntrack]
Dec 14 16:54:03 [kernel] [<fcb40213>] ? RTSpinlockReleaseNoInts+0x13/0x20 [vboxdrv]
Dec 14 16:54:03 [kernel] [<c0525262>] ? __skb_clone+0x22/0xd0
Dec 14 16:54:03 [kernel] [<c0566fdc>] ? tcp_transmit_skb+0x45c/0x6a0
Dec 14 16:54:03 [kernel] [<c056905a>] ? tcp_write_xmit+0x1da/0x9c0
Dec 14 16:54:03 [kernel] [<f86a6d27>] ? tg3_start_xmit_dma_bug+0x507/0x830 [tg3]
Dec 14 16:54:03 [kernel] [<c0528336>] ? __alloc_skb+0x46/0x120
Dec 14 16:54:03 [kernel] [<c05698c1>] ? __tcp_push_pending_frames+0x31/0x90
Dec 14 16:54:03 [kernel] [<c055cf38>] ? tcp_sendmsg+0x698/0xa00
Dec 14 16:54:03 [kernel] [<c052fd72>] ? dev_queue_xmit+0xe2/0x460
Dec 14 16:54:03 [kernel] [<c052098f>] ? sock_sendmsg+0xff/0x120
Dec 14 16:54:03 [kernel] [<c0145460>] ? autoremove_wake_function+0x0/0x50
Dec 14 16:54:03 [kernel] [<c0525262>] ? __skb_clone+0x22/0xd0
Dec 14 16:54:03 [kernel] [<c0566fdc>] ? tcp_transmit_skb+0x45c/0x6a0
Dec 14 16:54:03 [kernel] [<c0520c10>] ? kernel_sendmsg+0x30/0x50
Dec 14 16:54:03 [kernel] [<c05ca1d0>] ? xs_send_kvec+0xa0/0xb0
Dec 14 16:54:03 [kernel] [<c05ca24d>] ? xs_sendpages+0x6d/0x210
Dec 14 16:54:03 [kernel] [<c018604e>] ? mempool_alloc_slab+0xe/0x10
Dec 14 16:54:03 [kernel] [<c05ca513>] ? xs_tcp_send_request+0x53/0x170
Dec 14 16:54:03 [kernel] [<c05c8cca>] ? xprt_transmit+0x6a/0x280
Dec 14 16:54:03 [kernel] [<c027f620>] ? nfs3_xdr_fhandle+0x0/0x40
Dec 14 16:54:03 [kernel] [<c05c672f>] ? call_transmit+0x17f/0x250
Dec 14 16:54:03 [kernel] [<c05cd7e7>] ? __rpc_execute+0x87/0x270
Dec 14 16:54:03 [kernel] [<c0127fde>] ? find_busiest_group+0x1be/0xa50
Dec 14 16:54:03 [kernel] [<c052098f>] ? sock_sendmsg+0xff/0x120
Dec 14 16:54:03 [kernel] [<c05cda41>] ? rpc_execute+0x71/0x80
Dec 14 16:54:03 [kernel] [<c05c7281>] ? rpc_run_task+0x31/0x70
Dec 14 16:54:03 [kernel] [<c05c73db>] ? rpc_call_sync+0x3b/0x70
Dec 14 16:54:03 [kernel] [<c027d52b>] ? nfs3_rpc_wrapper+0x1b/0x60
Dec 14 16:54:03 [kernel] [<c027dbcd>] ? nfs3_proc_getattr+0x3d/0x80
Dec 14 16:54:03 [kernel] [<c026fef0>] ? __nfs_revalidate_inode+0xa0/0x210
Dec 14 16:54:03 [kernel] [<c05ffffa>] ? schedule+0x32a/0x8a0
Dec 14 16:54:03 [kernel] [<c026edf0>] ? nfs_attribute_timeout+0x10/0x50
Dec 14 16:54:03 [kernel] [<c026a3df>] ? nfs_lookup_revalidate+0x53f/0x5d0
Dec 14 16:54:03 [kernel] [<c0601edd>] ? _spin_unlock_bh+0xd/0x10
Dec 14 16:54:03 [kernel] [<c027f025>] ? nfs3_xdr_attrstat+0x15/0x30
Dec 14 16:54:03 [kernel] [<c01c0584>] ? dput+0x84/0x110
Dec 14 16:54:03 [kernel] [<c026a0b9>] ? nfs_lookup_revalidate+0x219/0x5d0
Dec 14 16:54:03 [kernel] [<c0600a95>] ? out_of_line_wait_on_bit+0xa5/0xc0
Dec 14 16:54:03 [kernel] [<c05cd0e0>] ? rpc_wait_bit_killable+0x0/0x40
Dec 14 16:54:03 [kernel] [<c0601edd>] ? _spin_unlock_bh+0xd/0x10
Dec 14 16:54:03 [kernel] [<c018608e>] ? mempool_free_slab+0xe/0x10
Dec 14 16:54:03 [kernel] [<c01862a7>] ? mempool_free+0x77/0x80
Dec 14 16:54:03 [kernel] [<c05ce975>] ? rpcauth_lookup_credcache+0x75/0x1c0
Dec 14 16:54:03 [kernel] [<c026c977>] ? nfs_do_access+0xa7/0x390
Dec 14 16:54:03 [kernel] [<c01c189d>] ? __d_lookup+0x8d/0x100
Dec 14 16:54:03 [kernel] [<c05ce56d>] ? put_rpccred+0x4d/0x110
Dec 14 16:54:03 [kernel] [<c01b7c31>] ? do_lookup+0x41/0x1c0
Dec 14 16:54:03 [kernel] [<c02e46fe>] ? security_inode_permission+0x1e/0x20
Dec 14 16:54:03 [kernel] [<c01b9fd4>] ? __link_path_walk+0x664/0xf40
Dec 14 16:54:03 [kernel] [<c01971cb>] ? kmap_high+0x1b/0x1a0
Dec 14 16:54:03 [kernel] [<c0279f10>] ? nfs_symlink_filler+0x0/0x60
Dec 14 16:54:03 [kernel] [<c01212ef>] ? kmap+0x4f/0x60
Dec 14 16:54:03 [kernel] [<c0279ef5>] ? nfs_follow_link+0x55/0x70
Dec 14 16:54:03 [kernel] [<c01ba6ae>] ? __link_path_walk+0xd3e/0xf40
Dec 14 16:54:03 [kernel] [<c01baaa4>] ? path_walk+0x54/0xb0
Dec 14 16:54:03 [kernel] [<c01bab51>] ? do_path_lookup+0x51/0x90
Dec 14 16:54:03 [kernel] [<c01bb8e3>] ? do_filp_open+0xd3/0x940
Dec 14 16:54:03 [kernel] [<c0145460>] ? autoremove_wake_function+0x0/0x50
Dec 14 16:54:03 [kernel] [<c01b065f>] ? vfs_read+0x11f/0x160
Dec 14 16:54:03 [kernel] [<c01b55ef>] ? open_exec+0x2f/0x100
Dec 14 16:54:03 [kernel] [<c018c151>] ? lru_cache_add_lru+0x21/0x40
Dec 14 16:54:03 [kernel] [<c01a115c>] ? page_add_new_anon_rmap+0x6c/0x70
Dec 14 16:54:03 [kernel] [<c0199fe1>] ? handle_mm_fault+0x451/0x7d0
Dec 14 16:54:03 [kernel] [<c0198d09>] ? follow_page+0x259/0x2d0
Dec 14 16:54:03 [kernel] [<c019a45b>] ? __get_user_pages+0xfb/0x3c0
Dec 14 16:54:03 [kernel] [<c013542f>] ? irq_exit+0x2f/0x70
Dec 14 16:54:03 [kernel] [<c0197067>] ? page_address+0xb7/0xd0
- Last output repeated twice -
Dec 14 16:54:03 [kernel] [<c019709a>] ? kunmap_high+0x1a/0xa0
Dec 14 16:54:03 [kernel] [<c01e69e0>] ? load_elf_binary+0x0/0x1980
Dec 14 16:54:03 [kernel] [<c01b474f>] ? search_binary_handler+0xbf/0x260
Dec 14 16:54:03 [kernel] [<c01b5d98>] ? do_execve+0x2d8/0x360
Dec 14 16:54:03 [kernel] [<c0101706>] ? sys_execve+0x46/0x70
Dec 14 16:54:03 [kernel] [<c0102dc4>] ? sysenter_do_call+0x12/0x22
|
At the time I was running emerge (aggressively) in a chroot on a NFS mount. I also had some VirtualBox machines running, one of which was actively tunneling IO. I'm not really sure what o make of that dump, can anyone offer advice on how to prevent this, or at least catch it before the system hangs? |
|