View previous topic :: View next topic |
Author |
Message |
engineermdr Guru
Joined: 08 Nov 2003 Posts: 305 Location: Altoona, WI, USA
|
Posted: Fri Feb 09, 2024 5:21 am Post subject: AMD-Vi: Event logged IO_PAGE_FAULT |
|
|
My fairly new NAS suffered an error today while I was gone at work. I came home to find any access to ZFS would hang, including a "zpool status". I looked through syslog and found
Code: | Feb 7 23:11:23 neroon kernel: ahci 0000:0b:00.0: Using 64-bit DMA addresses
Feb 7 23:11:23 neroon kernel: ahci 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0016 address=0x7ffffe00000 flags=0x0000]
Feb 7 23:11:23 neroon kernel: ahci 0000:0b:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0016 address=0x7ffffe00500 flags=0x0000]
Feb 7 23:11:54 neroon kernel: ata9.00: exception Emask 0x0 SAct 0x78fff8 SErr 0x0 action 0x6 frozen |
Then this gets reported over and over with just slightly different numbers
Code: | Feb 7 23:11:54 neroon kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Feb 7 23:11:54 neroon kernel: ata9.00: cmd 61/58:18:78:5b:19/06:00:e5:00:00/40 tag 3 ncq dma 8314
88 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 7 23:11:54 neroon kernel: ata9.00: status: { DRDY } |
After a reboot, this system came up error free, ZFS resilvered the failed drives (2) and all appears well, so far. I'm definitely going to exercise the system for a while before I trust any data to it.
Now, I'm wondering if this is an IOMMU issue or a drive/controller issue? It seems strange that days after boot the kernel would start "Using 64-bit DMA addresses". What would cause that? Maybe I have something wrongly configured. Which way should I investigate first? SMART is not showing any issues. If it happens again though, I'll start swapping drives or cables to try and isolate the problem. |
|
Back to top |
|
|
engineermdr Guru
Joined: 08 Nov 2003 Posts: 305 Location: Altoona, WI, USA
|
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23062
|
Posted: Fri Feb 09, 2024 4:06 pm Post subject: |
|
|
What kernel version is this? My initial attempts to find Using 64-bit DMA turned up no relevant hits, so either this is oddly line-wrapped in the kernel (defeating git grep), it is an out-of-tree driver, or the message is composed in parts (and I failed to guess the right composition).
My first guess would be that the Using 64-bit DMA line is just debug output that happens whenever the kernel (re)initializes the device.
As I interpret the linked mailing list posts, that reporter (and possibly you, too) have a buggy firmware problem. The device claims it can do 64-bit DMA, so the kernel proceeds to use 64-bit DMA with it. In practice, it can only do 43-bit DMA, and zeroes the remaining bits. If any of the upper (64 - 43) = 21 bits are set in the address the kernel picks, the device clears them, then proceeds to write to the wrong address. The device needs to write to exactly the address it was given. An appropriate entry in the kernel's quirk table can instruct the kernel to ignore the claimed 64-bit support, and ensure that the kernel gives the device only addresses that can be represented in 43 bits. Then the bits that the device wrongly zeroes are already zero (because the kernel took care to pick an address with that property), so the truncated address is still the "right" address, and no fault occurs.
Assuming the above paragraph is right, then this suddenly started failing because ordinary operations caused the kernel to finally pick an address that the device cannot handle. Your prior successes were because the kernel happened, by luck, to be picking addresses the device handled properly. |
|
Back to top |
|
|
engineermdr Guru
Joined: 08 Nov 2003 Posts: 305 Location: Altoona, WI, USA
|
Posted: Fri Feb 09, 2024 7:51 pm Post subject: |
|
|
This all started after upgrading to gentoo-sources-6.6.16. I had previously been using 6.1.67. I'm also having nfsd issues with the 6.6.16 update. So, I'm going back to 6.1 for the time being. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|