View previous topic :: View next topic |
Author |
Message |
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat May 18, 2024 1:41 am Post subject: Unaligned Write errors increased from 6.6.21 to 6.6.30 |
|
|
For quite a while on two laptops I see stuff like
Code: | [ 219.297785] ata1.00: failed command: WRITE FPDMA QUEUED
[ 219.297786] ata1.00: cmd 61/20:e0:90:d4:84/00:00:11:00:00/40 tag 28 ncq dma 16384 out
res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 219.297789] ata1.00: status: { DRDY }
[ 219.297790] ata1.00: failed command: WRITE FPDMA QUEUED
[ 219.297791] ata1.00: cmd 61/08:f8:80:88:c4/00:00:11:00:00/40 tag 31 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 219.297794] ata1.00: status: { DRDY }
[ 219.297796] ata1: hard resetting link
[ 219.611309] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 219.611821] ata1.00: ACPI cmd f5/00:00:00:00:00:e0(SECURITY FREEZE LOCK) filtered out
[ 219.611827] ata1.00: ACPI cmd b1/c1:00:00:00:00:e0(DEVICE CONFIGURATION OVERLAY) filtered out
[ 219.632678] ata1.00: ACPI cmd f5/00:00:00:00:00:e0(SECURITY FREEZE LOCK) filtered out
[ 219.632684] ata1.00: ACPI cmd b1/c1:00:00:00:00:e0(DEVICE CONFIGURATION OVERLAY) filtered out
[ 219.653113] ata1.00: configured for UDMA/133
[ 219.663265] ata1.00: device reported invalid CHS sector 0
[ 219.663267] ata1.00: device reported invalid CHS sector 0
[ 219.663268] ata1.00: device reported invalid CHS sector 0
[ 219.663270] ata1.00: device reported invalid CHS sector 0
-invalid CHS repeats more times-
[ 219.663321] sd 0:0:0:0: [sda] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=45s
[ 219.663324] sd 0:0:0:0: [sda] tag#21 Sense Key : Illegal Request [current]
[ 219.663325] sd 0:0:0:0: [sda] tag#21 Add. Sense: Unaligned write command
[ 219.663327] sd 0:0:0:0: [sda] tag#21 CDB: Read(10) 28 00 0f d3 69 a0 00 00 88 00
[ 219.663329] I/O error, dev sda, sector 265513376 op 0x0:(READ) flags 0x80700 phys_seg 11 prio class 2
|
This has plagued for several kernel versions but 6.6.30 seems to increase the frequency of these and going back to 6.6.21 seems to reduce frequency back to what I was seeing before. This is writing to ssd, and though the older SSD has gone through an average of 416 erase cycles, still has 85% life left. A long SMART selftest passes just fine on the SSDs.
I'm really thinking there's some kernel issue or perhaps some laptop sata power save tuning that's out of whack here, anyone seeing something like this? Thinking about swapping more SSDs around but the two SSDs I saw this on were an Intel and a Micron, and these were both on laptop SATA controllers. My desktops with their SSDs do not see this problem... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sat May 18, 2024 11:31 am Post subject: |
|
|
eccerr0r,
Code: | [ 219.297785] ata1.00: failed command: WRITE FPDMA QUEUED
[ 219.297786] ata1.00: cmd 61/20:e0:90:d4:84/00:00:11:00:00/40 tag 28 ncq dma 16384 out
res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) |
Its a Native Command Queuing problem. That was a trick to to get better utilisation on rotating rust by minimising head movement for a given sequence of commands by reordering them.
It does almost nothing, or less, for SSDs as they don't have heads to move. You can turn it off.
SMART can and does report pass, on dead drives. Share Code: | smartcll -x /dev/... | so we can look at the fine print. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat May 18, 2024 2:08 pm Post subject: |
|
|
Hmm.
Perhaps it was solely the Micron SSD that was exhibiting this issue and it's a firmware bug, though as an OEM disk I'm not sure where to get a firmware update as HP disclaims it, not to mention it may require Windows... It's just weird because a kernel version seems to exasperate it.
This SMART field keeps increasing each time it chokes as above:
Code: | 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 2498 |
I've had two Micron/Crucial SSDs and they've both have reallocated blocks from the get go (maybe I keep on getting customer returns?) but haven't grown so I assume that's okay. Except one of them got a firmware update as it was retail purchased disk.
---
Here's another weird observation:
This seems to happen after a fresh boot. After a few minutes of using the machine just after boot, it does these weird hangs.
After the hangs occurs for those few minutes after boot (up to 20 mins or so), the SSD works just fine.
Could be a warm up issue? Except that I can suspend the machine, resume the machine, and it doesn't happen again as far as I can tell.
I haven't tried hibernate yet, haven't needed to, but this could also be interesting. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|