View previous topic :: View next topic |
Author |
Message |
JohnBlbec Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
Joined: 08 Feb 2003 Posts: 306
|
Posted: Sun Dec 14, 2008 12:46 am Post subject: [WORKAROUND] kernel BUG at drivers/pci/intel-iommu.c:1373! |
|
|
hi everybody,
I am able to reproduce this bug everytime I run bonnie++ or when I copy or move file(s) of size more then about 4GB. I am not able to get a kernel's core dump or some text output on my disk because press the hard reset button on my computer is the only one possibility what I can do after this issue. well, I have taken a photo of my lcd with the bug. I do not know what other information kernel gurus wants so I can uptade this topic and I will add everything what you will want. the bug is realy very annoying :o(
the bug photo
my kernel config
$ uname -a
Code: |
Linux rpc-linux 2.6.26-gentoo-r4 #1 SMP Sat Dec 13 23:50:20 CET 2008 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel GNU/Linux
|
$ free -m
Code: |
total used free shared buffers cached
Mem: 4011 1503 2508 0 18 1004
-/+ buffers/cache: 480 3531
Swap: 4095 0 4095
|
# lspci
Code: |
00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 01)
00:01.0 PCI bridge: Intel Corporation Host-Primary PCI Express Bridge (rev 01)
00:19.0 Ethernet controller: Intel Corporation 82566DC-2 Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation Device 05e2 (rev a1)
02:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID (rev 01)
03:00.0 IDE interface: Marvell Technology Group Ltd. Device 6121 (rev b2)
04:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
|
Last edited by JohnBlbec on Sun Dec 14, 2008 10:35 pm; edited 1 time in total |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Neo2 Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
Joined: 25 Sep 2006 Posts: 224 Location: Italy
|
Posted: Sun Dec 14, 2008 10:54 am Post subject: |
|
|
Before requesting anything to the kernel devs, remove any proprietary/binary module that taints the kernel ("Tainted: P" at the end of module listing) else you'll get no support at all. In this case it seems you're using the "nvidia" module. Remove this module and re-run bonnie++.
If the system hangs again, to generate a good bugreport you should at least enable CONFIG_KALLSYMS ("CONFIG_KALLSYMS=y") in your kernel config and recompile. This way you can fill all the spaces next to the long list of addresses below the "Call trace:" line and provide the name of the function that caused the bug.
If removing the module does help then you should consider updating the module or (since the problem would probably lie in the graphical rendering module) switch to another DRM system (like the opensource DRI).
Maybe updating the kernel would help too, but I can't say much more without additional information. _________________ Neo2
Unofficial minimal liveCD for x86/amd64 w/reiser4+truecrypt |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
JohnBlbec Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
Joined: 08 Feb 2003 Posts: 306
|
Posted: Sun Dec 14, 2008 12:01 pm Post subject: |
|
|
thanks for advice, neo2. I have done everything you wrote and the result is the same, kernel crash again...
new kernel crash photo
lshw output
Note: I am able to emerge whole world (emerge -e world) without any troubles but cp or mv big files stuck my linux. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Neo2 Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
Joined: 25 Sep 2006 Posts: 224 Location: Italy
|
Posted: Sun Dec 14, 2008 4:38 pm Post subject: |
|
|
Well, I've investigated a little in the kernel sources. Apparently the domain_page_mapping function that gets invoked from the intel_map_sg function (which is contained in drivers/pci/intel-iommu.c) fails, and this leads to deadlock. I've searched through the kernel ChangeLog and there seems to be nothing regarding those two functions from 2.6.26.4->2.6.27.9, thus it is a probably unknown bug. I think this is the code leading to the deadlock (especially the BUG_ON macro; extracted from intel-iommu.c):
Code: | static int
domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova,
u64 hpa, size_t size, int prot)
{
u64 start_pfn, end_pfn;
struct dma_pte *pte;
int index;
int addr_width = agaw_to_width(domain->agaw);
hpa &= (((u64)1) << addr_width) - 1;
if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
return -EINVAL;
iova &= PAGE_MASK;
start_pfn = ((u64)hpa) >> VTD_PAGE_SHIFT;
end_pfn = (VTD_PAGE_ALIGN(((u64)hpa) + size)) >> VTD_PAGE_SHIFT;
index = 0;
while (start_pfn < end_pfn) {
pte = addr_to_dma_pte(domain, iova + VTD_PAGE_SIZE * index);
if (!pte)
return -ENOMEM;
/* We don't need lock here, nobody else
* touches the iova range
*/
BUG_ON(dma_pte_addr(*pte));
dma_set_pte_addr(*pte, start_pfn << VTD_PAGE_SHIFT);
dma_set_pte_prot(*pte, prot);
__iommu_flush_cache(domain->iommu, pte, sizeof(*pte));
start_pfn++;
index++;
}
return 0;
} |
Honestly, I don't know where to patch the source to get your problem fixed, and at this point you may want to file a bug at the kernel devs. This page contains useful info from where to start from: http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html
Anyway, if your system runs fine for everyday use (browsing, mail, etc), I guess this can be considered a minor issue (until you need to copy >4Gb file, of course).
Hope to have helped
Cheers,
Neo2 _________________ Neo2
Unofficial minimal liveCD for x86/amd64 w/reiser4+truecrypt |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
JohnBlbec Guru
![Guru Guru](/images/ranks/rank_rect_3.gif)
Joined: 08 Feb 2003 Posts: 306
|
Posted: Sun Dec 14, 2008 5:10 pm Post subject: |
|
|
hi neo2. I am going to report a kernel bug as you advice me, thanks. unfortunately, it is not a minor issue because I am not able to clone my 1TB db. fucking business. well, thanks once again and I will inform gentoo forum users when I find a patch out...
the bug has been submitted: Kernel Bug Tracker Bug 12222
workaround: to use"intel_iommu=off" as a linux kernel boot parameter, but I do not know what performance impact should we expect... |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|