[WORKAROUND] kernel BUG at drivers/pci/intel-iommu.c:1373!

JohnBlbec · Guru Joined: 08 Feb 2003 Posts: 306

hi everybody,

I am able to reproduce this bug everytime I run bonnie++ or when I copy or move file(s) of size more then about 4GB. I am not able to get a kernel's core dump or some text output on my disk because press the hard reset button on my computer is the only one possibility what I can do after this issue. well, I have taken a photo of my lcd with the bug. I do not know what other information kernel gurus wants so I can uptade this topic and I will add everything what you will want. the bug is realy very annoying :o(

the bug photo

my kernel config

$ uname -a

Neo2 · Apprentice Joined: 25 Sep 2006 Posts: 224 Location: Italy

Before requesting anything to the kernel devs, remove any proprietary/binary module that taints the kernel ("Tainted: P" at the end of module listing) else you'll get no support at all. In this case it seems you're using the "nvidia" module. Remove this module and re-run bonnie++.
If the system hangs again, to generate a good bugreport you should at least enable CONFIG_KALLSYMS ("CONFIG_KALLSYMS=y") in your kernel config and recompile. This way you can fill all the spaces next to the long list of addresses below the "Call trace:" line and provide the name of the function that caused the bug.
If removing the module does help then you should consider updating the module or (since the problem would probably lie in the graphical rendering module) switch to another DRM system (like the opensource DRI).
Maybe updating the kernel would help too, but I can't say much more without additional information.
_________________
Neo2
Unofficial minimal liveCD for x86/amd64 w/reiser4+truecrypt

JohnBlbec · Guru Joined: 08 Feb 2003 Posts: 306

thanks for advice, neo2. I have done everything you wrote and the result is the same, kernel crash again...

new kernel crash photo

lshw output

Note: I am able to emerge whole world (emerge -e world) without any troubles but cp or mv big files stuck my linux.

Neo2 · Apprentice Joined: 25 Sep 2006 Posts: 224 Location: Italy

Well, I've investigated a little in the kernel sources. Apparently the domain_page_mapping function that gets invoked from the intel_map_sg function (which is contained in drivers/pci/intel-iommu.c) fails, and this leads to deadlock. I've searched through the kernel ChangeLog and there seems to be nothing regarding those two functions from 2.6.26.4->2.6.27.9, thus it is a probably unknown bug. I think this is the code leading to the deadlock (especially the BUG_ON macro; extracted from intel-iommu.c):

JohnBlbec · Guru Joined: 08 Feb 2003 Posts: 306

hi neo2. I am going to report a kernel bug as you advice me, thanks. unfortunately, it is not a minor issue because I am not able to clone my 1TB db. fucking business. well, thanks once again and I will inform gentoo forum users when I find a patch out...

the bug has been submitted: Kernel Bug Tracker Bug 12222

workaround: to use"intel_iommu=off" as a linux kernel boot parameter, but I do not know what performance impact should we expect...