Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED]BUG: soft lockup - CPU#0 stuck for 11s! [java:10670]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Sat Mar 01, 2008 8:33 pm    Post subject: [SOLVED]BUG: soft lockup - CPU#0 stuck for 11s! [java:10670] Reply with quote

Hi there,

I was having some weird problem with the keyboard. The keys stopped working, all of them. The only combination possible was to do a safe restart with ctrl+alt+sysreq. First i suppose a hacker attack (paranoia) but when i started to track the problem, always happens running azureus. So I look at the logs and I found this:

Code:

Mar  1 03:09:50 chaplin BUG: soft lockup - CPU#0 stuck for 11s! [java:10670]
Mar  1 03:09:50 chaplin CPU 0:
Mar  1 03:09:50 chaplin Modules linked in: bridge llc w83627ehf
hwmon_vid eeprom fuse snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
af_packet cpufreq_conservative cpufreq_ondemand cpufreq_powersave
cpufreq_userspace fan button thermal powernow_k8 freq_table processor
nvidia(P) ohci_hcd evdev 8250_pnp irtty_sir sir_dev irda crc_ccitt
parport_pc parport ehci_hcd usbcore snd_hda_intel snd_pcm snd_timer
snd snd_page_alloc 8250 serial_core psmouse pcspkr k8temp forcedeth
i2c_nforce2 i2c_core unix
Mar  1 03:09:50 chaplin Pid: 10670, comm: java Tainted: P
2.6.24-gentoo #1
Mar  1 03:09:50 chaplin RIP: 0010:[<ffffffff803f8b24>]
[<ffffffff803f8b24>] _spin_lock_irqsave+0x12/0x24
Mar  1 03:09:50 chaplin RSP: 0018:ffff81000bdef9c0  EFLAGS: 00000286
Mar  1 03:09:50 chaplin RAX: 0000000000000287 RBX: ffffffff805dd8a8
RCX: 000000000000000f
Mar  1 03:09:50 chaplin RDX: ffff81000bdefa60 RSI: ffff81002aca0d70
RDI: ffff81002aca0da8
Mar  1 03:09:50 chaplin RBP: ffff81003bcf5200 R08: 0000000000000064
R09: ffff810001e001c0
Mar  1 03:09:50 chaplin R10: 0000000000000002 R11: 0000000000000001
R12: 0000000000000000
Mar  1 03:09:50 chaplin R13: ffff81003ede5500 R14: 0000000000000000
R15: ffff81000bdb9770
Mar  1 03:09:50 chaplin FS:  0000000043e67950(0063)
GS:ffffffff8052e000(0000) knlGS:00000000f406bb90
Mar  1 03:09:50 chaplin CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  1 03:09:50 chaplin CR2: 00002aaaaacd3340 CR3: 0000000028c51000
CR4: 00000000000006e0
Mar  1 03:09:50 chaplin DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Mar  1 03:09:50 chaplin DR3: 0000000000000000 DR6: 00000000ffff0ff0
DR7: 0000000000000400Mar  1 03:09:50 chaplin
Mar  1 03:09:50 chaplin Call Trace:
Mar  1 03:09:50 chaplin [<ffffffff802f6358>] prop_norm_percpu+0x3f/0xea
Mar  1 03:09:50 chaplin [<ffffffff802f65f7>] prop_fraction_percpu+0x3d/0x69
Mar  1 03:09:50 chaplin [<ffffffff802640ec>] get_dirty_limits+0xd0/0x184
Mar  1 03:09:50 chaplin [<ffffffff889f5788>] :fuse:fuse_dev_cleanup+0x133/0x13c
Mar  1 03:09:50 chaplin [<ffffffff8026424f>]
balance_dirty_pages_ratelimited_nr+0xaf/0x2b3
Mar  1 03:09:50 chaplin [<ffffffff8025faaf>]
generic_file_buffered_write+0x524/0x645
Mar  1 03:09:50 chaplin [<ffffffff80247284>] autoremove_wake_function+0x0/0x2e
Mar  1 03:09:50 chaplin [<ffffffff8020a843>] __switch_to+0x26e/0x27d
Mar  1 03:09:50 chaplin [<ffffffff8025ff0e>]
__generic_file_aio_write_nolock+0x33e/0x3a8
Mar  1 03:09:50 chaplin [<ffffffff803f753c>] thread_return+0x3d/0x81
Mar  1 03:09:50 chaplin [<ffffffff8024d973>] get_futex_key+0x82/0x14e
Mar  1 03:09:50 chaplin [<ffffffff8025ffd9>] generic_file_aio_write+0x61/0xc1
Mar  1 03:09:50 chaplin [<ffffffff8025ff78>] generic_file_aio_write+0x0/0xc1
Mar  1 03:09:50 chaplin [<ffffffff8027eb58>] do_sync_readv_writev+0xc0/0x107
Mar  1 03:09:50 chaplin [<ffffffff802f77a3>] __up_read+0x13/0x8a
Mar  1 03:09:50 chaplin [<ffffffff80247284>] autoremove_wake_function+0x0/0x2e
Mar  1 03:09:50 chaplin [<ffffffff8024edb3>] do_futex+0x8a/0xa00
Mar  1 03:09:50 chaplin [<ffffffff8027e9ed>] rw_copy_check_uvector+0x6c/0xdc
Mar  1 03:09:50 chaplin [<ffffffff8027f1a1>] do_readv_writev+0xb2/0x18b
Mar  1 03:09:50 chaplin [<ffffffff802499e1>] ktime_get_ts+0x17/0x48
Mar  1 03:09:50 chaplin [<ffffffff803f7b2c>] mutex_lock+0xd/0x1e
Mar  1 03:09:50 chaplin [<ffffffff8027f696>] sys_writev+0x45/0x6e
Mar  1 03:09:50 chaplin [<ffffffff8020be2e>] system_call+0x7e/0x83


Before this, Azureus wasn't working because in an update I removed the blackdown-sdk (blocking issue) and the update installed the sun-jdk-1.6.*. So, when I run azureus, some message like "vms not found" appears. Then, I did:

Code:
java-config -S sun-sdk


And azureus started working. Then, this problems appears (never before I have this problem). So, i tried to install blackdown-sdk, and use it (1.4). The problem was much worst. The entire computer hangs. Now I'm removing sun-sdk, and compiling again azureus to test if this happens again.

Anyone has any ideas why this could happen? The important lines for me are:

Code:
Mar  1 03:09:50 chaplin Pid: 10670, comm: java Tainted: P


and:

Code:
Mar  1 03:09:50 chaplin [<ffffffff889f5788>] :fuse:fuse_dev_cleanup+0x133/0x13c


Because the partition were I download thing it is ntfs using ntfs3g with fuse (outside the kernel, fuse from portage), with ntfs3g compiled with the suid flag.

Regards,
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar


Last edited by gentunian on Fri Mar 28, 2008 7:55 pm; edited 1 time in total
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Sat Mar 01, 2008 8:44 pm    Post subject: Reply with quote

sun-sdk-1.6.0.03 it's a dependency:

chaplin seba # emerge -av azureus

These are the packages that would be merged, in order:

Code:
Calculating dependencies... done!
[ebuild  N    ] dev-java/sun-jdk-1.6.0.03  USE="X alsa -doc -examples -jce (-nsplugin) -odbc" 0 kB
[ebuild   R   ] net-p2p/azureus-2.5.0.4-r1  USE="-source" 0 kB


So i think i must use sun-sdk [?]
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
irgu
Tux's lil' helper
Tux's lil' helper


Joined: 25 Apr 2003
Posts: 131

PostPosted: Sat Mar 01, 2008 11:51 pm    Post subject: Reply with quote

From http://article.gmane.org/gmane.comp.file-systems.ntfs-3g.devel/418
Quote:

Thanks to our Gentoo users and Miklos Szeredi, it was found out recently
that the FUSE kernel module used from the FUSE software packages (Gentoo
default) with the 2.6.24 Linux kernels can lockup the system.

Solution: use the FUSE kernel module included in the 2.6.24 Linux kernel
(drawback: NTFS can't be NFS exported).
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Mon Mar 03, 2008 9:05 pm    Post subject: Reply with quote

thanks! I'll try the kernel module.
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
brazso
n00b
n00b


Joined: 18 Dec 2004
Posts: 17
Location: Budapest, Hungary

PostPosted: Thu Mar 27, 2008 8:52 am    Post subject: Reply with quote

Using "fuse" compiled in the kernel solved your lockup problem? I'm using the latest 2.6.24-r3 (stable branch) kernel but I still have complete system lockups during writing to ntfs partition. At first the touched avidemux application freezes then soon I lose the keyboard and/or the mouse. Emerging ntfs3g still includes the emerge of fuse 2.7.2 (latest in test branch) but it says that it is not used as module due to the found fuse in the kernel. Listmod displays fuse, but I cannot specify its version. Is there a way to display versions of loaded modules?
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Thu Mar 27, 2008 10:00 am    Post subject: Reply with quote

Yes. Using fuse compiled in the kernel solved my lockup problem. I really forgot this thread, I apologized for that. The problem initiated when azureus was accessing to a ntfs partition using fuse compiled outside the kernel, that is, by the fuse package. The fuse package complains about the module being compiled in the kernel, as you said.

My kernel version is 2.6.24-gentoo-r3 (don't know if that is now stable, when I emerged was masked, but you are saying that it's stable, so maybe it is) and the fuse version is 2.7.2. Nothing to do with, but just for the record my ntfs3g is compiled with the suid flag.

You could use modinfo to gather information about a module, but I think it would not provide the module version information you want though. I've been trying the --dump-modversions option for modprobe, but it seems that it doesn't work.

It's weird that we have the same kernel and the same package version. Did you see the logs? Maybe it's not the same problem.
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
brazso
n00b
n00b


Joined: 18 Dec 2004
Posts: 17
Location: Budapest, Hungary

PostPosted: Thu Mar 27, 2008 4:30 pm    Post subject: Reply with quote

Thanks for your detailed answer, gentunian! To be frankly I found nothing suspicious in the logs after lockups. I checked everything under /var/log directory, especially the current output file of dmesg. Do I have to activate some extra log in the kernel before? After I had various lockups during the usage of avidemux (output went to ntfs partition), I just tried to copy manually a bigger file into my ntfs partition, and it resulted the same lockup. That is why I think ntfs-3g is responsible for the problem.
Back to top
View user's profile Send private message
irgu
Tux's lil' helper
Tux's lil' helper


Joined: 25 Apr 2003
Posts: 131

PostPosted: Thu Mar 27, 2008 9:09 pm    Post subject: Reply with quote

The problem is the FUSE kernel module. Either you need to use the FUSE module in the 2.6.24 kernel or the one in the FUSE 2.7.3 package. The FUSE kernel module in the FUSE 2.7.2 package is broken.
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Thu Mar 27, 2008 9:16 pm    Post subject: Reply with quote

Quote:
Do I have to activate some extra log in the kernel before?


If you don't have any log package installed, then it's safe to you to use one. I use syslog-ng, just for the fact that it's the one in the installation guide. The very first time I installed gentoo I follow up the guide line by line. And my gentoo has this time running. So, if you don't have any log application running, I recommend to emerge one. To use syslog-ng, you have to emerge it:

Code:
emerge -av syslog-ng


and then put it in the default level:

Code:
rc-update add syslog-ng default


and if you want to start logging now:

Code:
/etc/init.d/syslog-ng start



To see your log continuosly you could do something like this:

Code:
tail -f /var/log/messages


Dmesg output it's not detailed as the messages file. So, if you can reproduce the lockup, after doing it, run the above command to have an instant visual of whats happening on your box. As you can't use the keyboard or even (sometimes) the mouse, I recommend to think in a special layout of your favorite terminal showing you the log.

I almost activated the most debug messages I found interesting in the kernel the times I compiled it. So, if you see some debug option you could find useful, you may activate it. I really just don't know if I could log that due to a debug option activated in the kernel.

Is mostly sure that your lockup problem is provocated by fuse. ntfs3g uses fuse. As you can see my log, the call trace is invoking to some kind of read-write operations and the fuse module is involved in the trace.

So, as I told you before, if you can reproduce the lockup, reproduce it viewing the logs, or just reproduce it and then reboot and see the logs.

Note: To safely reboot your computer without pressing the reset button, you could use the system req key. The SysReq key it's the same as the printScreen key. So, if you hold both ALT and SysReq key, then with a combination of the below keys you could:

  • R - raw mode keyboard.
  • E - terminate all processes
  • I - kill all processes
  • S - sinc disc
  • U - unmount filesystems
  • B - reboot


Pressing in the above order it's safe to restart the computer. Note to always hold the 3 keys you press, that is, alt + sysreq + r to enter raw mode, alt + sysreq + e, to terminate all processes, and so on. If the keys doesn't work, you need to compile it in the kernel. You can check this by doing:

Code:
grep MAGIC <your-config-kernel-file>


and if you can see this:

Code:
CONFIG_MAGIC_SYSRQ=y


Then you have enable the feature, if not, you could edit the config file to find that line (something like "CONFIG_MAGIC_SYSRQ is not set"), and change it like the above and compile the kernel, or you could do the traditional way using make menuconfig, and entering the "kernel hacking" section.


EDIT:
irgu wrote:
The problem is the FUSE kernel module. Either you need to use the FUSE module in the 2.6.24 kernel or the one in the FUSE 2.7.3 package. The FUSE kernel module in the FUSE 2.7.2 package is broken.


Sorry if i'm wrong, but what I understood of what brazso said it's that he compiled the FUSE module in the kernel 2.6.24 and he still has the problem.
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
brazso
n00b
n00b


Joined: 18 Dec 2004
Posts: 17
Location: Budapest, Hungary

PostPosted: Fri Mar 28, 2008 8:36 am    Post subject: Reply with quote

gentunian> I had installed gentoo to my desktop machine at least 2 years ago so I did not remember which log package was active. I checked it, it's the metalog. In its configuration file I think everything is switched on and I have lot of files under /var/log, e.g. "messages" that you mentioned. However I found nothing inside of the log files which might be related with the lockups. Following your advice I shall try to activate all debug facilities in the kernel, moreover I shall try the ntfs writing with my previous kernel (2.6.23). Thanks for the detailed description of using SysRq button, I always learn something new.
I have just noticed irgu's comment. Theoretically I'm using the compiled fuse module from kernel 2.6.24-r3. Command modinfo fuse confirms that fuse.ko is used and loaded from the actual kernel path. However the output of dmesg contains some message about fuse 2.7.2 as loaded one. I will try a later kernel from the test branch, I'm curious to know which fuse version is used there.
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Fri Mar 28, 2008 11:03 am    Post subject: Reply with quote

brazso, checking the log for all fuse lines (eg, "cat /var/log/messages | grep fuse") I found that since I compiled the FUSE module in the kernel, the line:

Code:
Mar  3 18:21:30 chaplin fuse init (API version 7.8)
Mar  3 18:21:30 chaplin fuse distribution version: 2.7.2


it's gone. If you see the date of this thread it's consistent with this. After the "Mar 3" date, "Distributed version: 2.7.2" it's no more shown. So, I deduce that that message came from the FUSE module compiled by the fuse package. You can check also that, to see whether you have the module from the package or compiled by the kernel. Maybe you have both, and modprobe still inserting the wrong one, i mean, the one you don't want to be inserted.

This is the line that appears after the compilation and until now:

Code:
Mar  3 19:08:32 chaplin fuse init (API version 7.9)


Regarding to the log level...I was looking to the kernel to find some debug options enabled. In the "kernel hacking" section I've activated "Kernel debug", and if you check below there's one called "Detect soft lockups". You should check that, compile the kernel and try. Also look at the others one. But, I think that with those two you would be great.

Cheers,
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
brazso
n00b
n00b


Joined: 18 Dec 2004
Posts: 17
Location: Budapest, Hungary

PostPosted: Fri Mar 28, 2008 1:31 pm    Post subject: Reply with quote

gentunian> after system boot dmesg now displays exactly the same loaded fuse info (7.8, 2.7.2) that you included before activating its kernel usage. I'm afraid you are right and the kernel still uses the fuse package despite it is set in the kernel as module however modinfo says its opposite. Did you remove the emerged fuse package? More packages (one of them is ntfs3g) depend on it, so it cannot be removed in my case. Still I can try to embed it (=y) into the kernel instead of modul setting (=m). You have the same kernel version than mine (2.6.24-r3). Could you tell me the size of the fuse.ko file displayed by modinfo fuse? It cannot be different, but who knows :) I will check the logs again after setting the kernel debug on, thanks!
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Fri Mar 28, 2008 2:19 pm    Post subject: Reply with quote

Well, if you have two modules of the same name you can located using slocate. You can check the /lib/modules directory. Also, it's safe to unmerge the fuse package, compile the kernel module, and the emerge the fuse package.

BTW, modinfo tells you the path to the module. That doesn't mean that the module it's from the kernel source. So, if you have your slocate updated, you could check for the fuse module doing:

Code:
slocate fuse.ko


If you're in doubts, remove what you find (making a backup if you want). Then, compile the fuse kernel module. You could use it in the kernel, but I prefer "moduling" the things, so I recommend that too.

If you're slocate isn't updated you can updated by running as root

Code:
slocate -u


If you don't want to use slocate, you can check your /lib/modules directories using find, and remove (or move, backup, or whatever you want) the fuse.ko and then compile it again (from kernel).

I don't remember the procedure I made. But I'm mostly sure that I unmerged the fuse package, compiled the kernel, reboot with the new kernel (sometimes you can't unload modules, so the old module is already loaded and you want to remove it. Anyway, running "mount" command you can see what's using fuse an unmount that, then you should remove the old module), and then emerge the fuse package.

Good luck! and don't forget to tell us how did you do.
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
brazso
n00b
n00b


Joined: 18 Dec 2004
Posts: 17
Location: Budapest, Hungary

PostPosted: Fri Mar 28, 2008 7:38 pm    Post subject: Reply with quote

The proposed slocate command solved the problem, it displayed 2 fuse.ko files under /lib/modules/2.6.24-gentoo-r3.
Code:
slocate fuse.ko
/lib/modules/2.6.24-gentoo-r3/fs/fuse/fuse.ko
/lib/modules/2.6.24-gentoo-r3/kernel/fs/fuse/fuse.ko
The first one belonged to the fuse package. It seems that removment of the package did not erase its fuse.ko. I was so lucky that the system always chose the first one instead of the kernel one. I removed manually the first one, still I have checked that a new emerge sys-fs/fuse does not create it again. By now I get the expected version of fuse in the log moreover the soft lockup has vanished.
Code:
dmesg | grep fuse
fuse init (API version 7.9)

Thanks for the great help, gentunian! I think you can set the status of this topic to solved.
Back to top
View user's profile Send private message
gentunian
Tux's lil' helper
Tux's lil' helper


Joined: 10 Jul 2006
Posts: 118
Location: Río Cuarto, Argentina

PostPosted: Fri Mar 28, 2008 7:45 pm    Post subject: Reply with quote

brazso wrote:
... It seems that removment of the package did not erase its fuse.ko.


Thats because maybe the module was in use, like I said before.

I'm glad you solved your problem! The really help came by irgu letting us know about the bug, you should say thanks to him.

Regards,
_________________
Si un pueblo tiene hambre, no le des un pez. Enséñale a pescar.
http://labombiya.com.ar
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum