Pearlseattle Apprentice
Joined: 04 Oct 2007 Posts: 165 Location: Switzerland
|
Posted: Thu Jul 05, 2018 9:18 pm Post subject: Multiqueue schedulers kill Marvell 88SE9230 SATA controller? |
|
|
Hi
Background info
I am running since ~2 months a quite heavy parallelized database job that inserts every hour hundreds of millions of rows into a MariaDB database (using the TokuDB storage engine).
I'm continuously testing small config changes on all levels (DB, storage, compiler, etc...) as any small improvement will have a potentially big impact on a long-term perspective.
"uname -a": Linux quad 4.14.40-gentoo #6 SMP PREEMPT Sat Jun 16 14:11:41 CEST 2018 x86_64 Intel(R) Xeon(R) CPU E31270 @ 3.40GHz GenuineIntel GNU/Linux
The storage used by MariaDB is set up as follows:
1) a "bcache" virtual device on which Mariadb stores all data (DB tables & indexes).
1a) "sdf": a HDD (~8TB) that stores all data (DB tables & indexes) => this is connected to "bcache" as "backing device" (where all data is supposed to end up being stored).
1b) "sda": a SSD (~460GB) that acts as read & write cache ("writeback"-mode) (for "sdf") => this is connected to "bcache" as "caching device".
2) "sde": a HDD (~4TB) that stores the binary logs and TokuDB logs of MariaDB.
This setup worked without problems; when I tested the classical I/O schedulers, the best combo resulted in being "NOOP" for "sda", "CFQ" for "sde" and "DEADLINE" for "sdf".
Problem
A few days ago I decided to test the new "multiqueue" schedulers (deadline, BFQ and Kyber).
I did not have to change anything in the Kernel (I had preemptively activated the relevant options some months ago), so I just rebooted by adding "scsi_mod.use_blk_mq=1" as boot parameter => I then set "Kyber" for "sda", "DEADLINE" for "sdf" and "BFQ" for "sde".
About 30 minutes after running the usual workload I noticed that the "sdf"-disk was quiet => I therefore checked "dmesg" & "/var/log/messages" and saw this:
Quote: |
...
Jul 2 21:22:46 quad kernel: ata7.00: exception Emask 0x0 SAct 0x4000000 SErr 0x0 action 0x6 frozen
Jul 2 21:22:46 quad kernel: ata7.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:22:46 quad kernel: ata7.00: cmd 61/18:d0:e8:0f:51/00:00:00:00:00/40 tag 26 ncq dma 12288 out\x0a
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:22:46 quad kernel: ata7.00: status: { DRDY }
Jul 2 21:22:46 quad kernel: ata7: hard resetting link
Jul 2 21:22:47 quad kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 21:22:47 quad kernel: ata7.00: configured for UDMA/133
Jul 2 21:22:47 quad kernel: ata7.00: device reported invalid CHS sector 0
Jul 2 21:22:47 quad kernel: ata7: EH complete
Jul 2 21:26:12 quad kernel: ata7.00: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x6 frozen
Jul 2 21:26:12 quad kernel: ata7.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:12 quad kernel: ata7.00: cmd 61/a0:40:60:66:86/01:00:00:00:00/40 tag 8 ncq dma 212992 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:12 quad kernel: ata7.00: status: { DRDY }
Jul 2 21:26:12 quad kernel: ata7: hard resetting link
Jul 2 21:26:17 quad kernel: ata7: link is slow to respond, please be patient (ready=0)
Jul 2 21:26:22 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:26:22 quad kernel: ata7: hard resetting link
Jul 2 21:26:27 quad kernel: ata7: link is slow to respond, please be patient (ready=0)
Jul 2 21:26:32 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:26:32 quad kernel: ata7: hard resetting link
Jul 2 21:26:37 quad kernel: ata7: link is slow to respond, please be patient (ready=0)
Jul 2 21:26:43 quad kernel: ata9.00: exception Emask 0x0 SAct 0x1ffe00 SErr 0x0 action 0x6 frozen
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/a8:48:28:9e:ee/01:00:1d:00:00/40 tag 9 ncq dma 217088 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/38:50:d0:9f:ee/00:00:1d:00:00/40 tag 10 ncq dma 28672 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/f8:58:c0:d2:ee/01:00:1d:00:00/40 tag 11 ncq dma 258048 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/00:60:b8:d4:ee/04:00:1d:00:00/40 tag 12 ncq dma 524288 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/40:68:b8:d8:ee/03:00:1d:00:00/40 tag 13 ncq dma 425984 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/08:70:88:e1:ee/00:00:1d:00:00/40 tag 14 ncq dma 4096 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/18:78:60:21:ef/00:00:1d:00:00/40 tag 15 ncq dma 12288 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/10:80:50:57:ef/01:00:1d:00:00/40 tag 16 ncq dma 139264 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/f8:88:58:78:ef/02:00:1d:00:00/40 tag 17 ncq dma 389120 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 61/08:90:50:7b:ef/02:00:1d:00:00/40 tag 18 ncq dma 266240 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: READ FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 60/08:98:b8:e4:a8/00:00:9e:00:00/40 tag 19 ncq dma 4096 in\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9.00: failed command: READ FPDMA QUEUED
Jul 2 21:26:43 quad kernel: ata9.00: cmd 60/40:a0:38:e5:a8/00:00:9e:00:00/40 tag 20 ncq dma 32768 in\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 21:26:43 quad kernel: ata9.00: status: { DRDY }
Jul 2 21:26:43 quad kernel: ata9: hard resetting link
Jul 2 21:26:44 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 21:26:49 quad kernel: ata9.00: qc timeout (cmd 0xec)
Jul 2 21:26:50 quad kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jul 2 21:26:50 quad kernel: ata9.00: revalidation failed (errno=-5)
Jul 2 21:26:50 quad kernel: ata9: hard resetting link
Jul 2 21:26:51 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 21:27:01 quad kernel: ata9.00: qc timeout (cmd 0xec)
Jul 2 21:27:02 quad kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jul 2 21:27:02 quad kernel: ata9.00: revalidation failed (errno=-5)
Jul 2 21:27:02 quad kernel: ata9: limiting SATA link speed to 3.0 Gbps
Jul 2 21:27:02 quad kernel: ata9: hard resetting link
Jul 2 21:27:03 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jul 2 21:27:07 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:27:07 quad kernel: ata7: limiting SATA link speed to 3.0 Gbps
Jul 2 21:27:07 quad kernel: ata7: hard resetting link
Jul 2 21:27:12 quad kernel: ata7: COMRESET failed (errno=-16)
Jul 2 21:27:12 quad kernel: ata7: reset failed, giving up
Jul 2 21:27:12 quad kernel: ata7.00: disabled
Jul 2 21:27:12 quad kernel: ata7: EH complete
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 CDB: Write(16) 8a 00 00 00 00 00 00 86 66 60 00 00 01 a0 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8808032
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 5246976 starting block 1101056)
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100544
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100545
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100546
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100547
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100548
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100549
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100550
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100551
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100552
Jul 2 21:27:12 quad kernel: Buffer I/O error on device sde1, logical block 1100553
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#9 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#9 CDB: Write(16) 8a 00 00 00 00 00 00 7f 5a 90 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8346256
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 90439689 (offset 0 size 0 starting block 1043283)
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#10 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#10 CDB: Write(16) 8a 00 00 00 00 00 00 86 68 00 00 00 07 88 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8808448
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#11 CDB: Write(16) 8a 00 00 00 00 00 00 86 6f 88 00 00 08 78 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8810376
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7344128 starting block 1101312)
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7344128 starting block 1101568)
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 86 78 00 00 00 08 d0 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 8812544
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7446528 starting block 1101824)
Jul 2 21:27:12 quad kernel: EXT4-fs warning (device sde1): ext4_end_bio:323: I/O error 10 writing to inode 99 (offset 738197504 size 7446528 starting block 1101850)
Jul 2 21:27:12 quad kernel: JBD2: Detected IO errors while flushing file data on sde1-8
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 CDB: Read(16) 88 00 00 00 00 00 00 1d 88 38 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 1935416
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#14 CDB: Write(16) 8a 00 00 00 00 00 e8 c4 30 d8 00 00 00 40 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 3905171672
Jul 2 21:27:12 quad kernel: Aborting journal on device sde1-8.
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_do_update_inode:5171: Journal has aborted
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): Delayed block allocation failed for inode 90439689 at logical offset 19027 with max blocks 2048 with error 30
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): This should not happen!! Data will be lost\x0a
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_writepages:2907: Journal has aborted
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 e8 c4 08 00 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 3905161216
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 488144896, lost sync page write
Jul 2 21:27:12 quad kernel: JBD2: Error -5 detected when updating journal superblock for sde1-8.
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#13 CDB: Write(16) 8a 00 00 00 00 00 00 00 08 00 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 2048
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1): ext4_journal_check_start:61: Detected aborted journal
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): Remounting filesystem read-only
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:12 quad kernel: sd 6:0:0:0: [sde] tag#12 CDB: Write(16) 8a 00 00 00 00 00 00 00 08 00 00 00 00 08 00 00
Jul 2 21:27:12 quad kernel: print_req_error: I/O error, dev sde, sector 2048
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_dirty_inode:5888: IO failure
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): I/O error while writing superblock
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): ext4_writepages: jbd2_start: 1024 pages, ino 99; err -30
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:12 quad kernel: EXT4-fs error (device sde1) in ext4_da_write_end:3188: IO failure
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 173, lost async page write
Jul 2 21:27:12 quad kernel: EXT4-fs (sde1): previous I/O error to superblock detected
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 524301, lost async page write
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 361758736, lost async page write
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 361766945, lost async page write
Jul 2 21:27:12 quad kernel: Buffer I/O error on dev sde1, logical block 0, lost sync page write
Jul 2 21:27:33 quad kernel: ata9.00: qc timeout (cmd 0xec)
Jul 2 21:27:33 quad kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jul 2 21:27:33 quad kernel: ata9.00: revalidation failed (errno=-5)
Jul 2 21:27:33 quad kernel: ata9.00: disabled
Jul 2 21:27:34 quad kernel: ata9: hard resetting link
Jul 2 21:27:35 quad kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jul 2 21:27:35 quad kernel: ata9: EH complete
Jul 2 21:27:35 quad kernel: scsi_io_completion: 7 callbacks suppressed
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#2 CDB: Read(16) 88 00 00 00 00 00 9e a8 e5 38 00 00 00 40 00 00
Jul 2 21:27:35 quad kernel: print_req_error: 7 callbacks suppressed
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 2661868856
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 9e a8 e4 b8 00 00 00 08 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 2661868728
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#0 CDB: Write(16) 8a 00 00 00 00 00 1d ef 7b 50 00 00 02 08 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502233936
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#29 CDB: Write(16) 8a 00 00 00 00 00 1d ef 78 58 00 00 02 f8 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502233176
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#28 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#28 CDB: Write(16) 8a 00 00 00 00 00 1d ef 57 50 00 00 01 10 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502224720
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#27 CDB: Write(16) 8a 00 00 00 00 00 1d ef 21 60 00 00 00 18 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502210912
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#26 CDB: Write(16) 8a 00 00 00 00 00 1d ee e1 88 00 00 00 08 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502194568
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#25 CDB: Write(16) 8a 00 00 00 00 00 1d ee d8 b8 00 00 03 40 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502192312
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#24 CDB: Write(16) 8a 00 00 00 00 00 1d ee d4 b8 00 00 04 00 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502191288
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:35 quad kernel: sd 8:0:0:0: [sdf] tag#23 CDB: Write(16) 8a 00 00 00 00 00 1d ee d2 c0 00 00 01 f8 00 00
Jul 2 21:27:35 quad kernel: print_req_error: I/O error, dev sdf, sector 502190784
Jul 2 21:27:41 quad kernel: scsi_io_completion: 87 callbacks suppressed
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#15 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a1 18 00 00 01 b8 00 00
Jul 2 21:27:41 quad kernel: print_req_error: 87 callbacks suppressed
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502505752
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#16 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a2 d0 00 00 04 00 00 00
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502506192
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#17 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a6 d0 00 00 00 08 00 00
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502507216
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:41 quad kernel: sd 8:0:0:0: [sdf] tag#18 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a6 d8 00 00 01 60 00 00
Jul 2 21:27:41 quad kernel: print_req_error: I/O error, dev sdf, sector 502507224
Jul 2 21:27:42 quad kernel: sd 8:0:0:0: [sdf] tag#19 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jul 2 21:27:42 quad kernel: sd 8:0:0:0: [sdf] tag#19 CDB: Write(16) 8a 00 00 00 00 00 1d f3 a8 38 00 00 04 00 00 00
...
|
Seems that "sde" and "sdf" had problems, but no other device.
I then rebooted the PC, set up for both sdf & sde different multiqueue schedulers but 5 minutes after restarting the usual workload some similar weirdness occurred again:
Quote: |
...
Jul 2 22:26:10 quad kernel: ata7.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
Jul 2 22:26:10 quad kernel: ata7.00: failed command: WRITE FPDMA QUEUED
Jul 2 22:26:10 quad kernel: ata7.00: cmd 61/60:d8:40:cd:85/00:00:02:00:00/40 tag 27 ncq dma 49152 out\x0a res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 2 22:26:10 quad kernel: ata7.00: status: { DRDY }
Jul 2 22:26:10 quad kernel: ata7: hard resetting link
Jul 2 22:26:10 quad kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 2 22:26:10 quad kernel: ata7.00: configured for UDMA/133
Jul 2 22:26:10 quad kernel: ata7: EH complete
...
|
2 DB-recoveries per day were enough so I immediately got rid of "scsi_mod.use_blk_mq=1" and since then I am using again the usual noop/deadline/cfq schedulers.
Since going back to the normal I/O schedulers everything works again perfectly, so I can exclude basic hardware problems.
Question
By looking at the output of my "lspci"...
Quote: |
# lspci -k
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Xeon E3-1200 Processor Family DRAM Controller
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
Kernel driver in use: pcieport
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
Kernel driver in use: pcieport
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: mei_me
Kernel modules: mei_me
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: ehci-pci
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. 6 Series/C200 Series Chipset Family High Definition Audio Controller
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
Kernel driver in use: pcieport
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
Kernel driver in use: pcieport
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b5)
Kernel driver in use: pcieport
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
Kernel driver in use: pcieport
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5)
Kernel driver in use: pcieport
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: ehci-pci
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C206 Chipset Family LPC Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. P8B WS Motherboard
Kernel driver in use: lpc_ich
Kernel modules: lpc_ich
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: ahci
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
Subsystem: ASUSTeK Computer Inc. P8 series motherboard
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 [GeForce GTX 760]
Kernel driver in use: nvidia
Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 HDMI Audio Controller
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
02:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 10)
Subsystem: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller
Kernel driver in use: ahci
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: ASUSTeK Computer Inc. Motherboard
Kernel driver in use: e1000e
Kernel modules: e1000e
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: ASUSTeK Computer Inc. Motherboard
Kernel driver in use: e1000e
Kernel modules: e1000e
07:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
Subsystem: ASUSTeK Computer Inc. P8B WS Motherboard
Kernel driver in use: xhci_hcd
08:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)
Subsystem: ASUSTeK Computer Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller
|
...and by looking at the output of "ls -al /sys/block/sd*"...
Quote: |
# ls -al /sys/block/sd*
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sda -> ../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdb -> ../devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdc -> ../devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sdc
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdd -> ../devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0/block/sdd
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sde -> ../devices/pci0000:00/0000:00:06.0/0000:02:00.0/ata7/host6/target6:0:0/6:0:0:0/block/sde
lrwxrwxrwx 1 root root 0 Jul 5 20:02 /sys/block/sdf -> ../devices/pci0000:00/0000:00:06.0/0000:02:00.0/ata9/host8/target8:0:0/8:0:0:0/block/sdf
|
...would you agree that mooooooost probably the old Marvell 88SE9230 disk controller has a problem with the kind-of-new multiqueue schedulers?
Or maybe I'm overlooking something?
I did not take into consideration complications caused by "bcache" because:
- on one hand "sde" DID have a problem even if not involved in any way with "bcache"
- on the other hand "sda" DID NOT have a problem even if being involved in the "bcache" setup.
Even by taking into consideration the load (MBs or I/O activity), "sde" (logs, impacted) has a much lighter load than "sda" (read & write cache, not impacted) => if the issue would be caused by the "load" on the device then I would rather expect the opposite.
I do have a new disk controller ready to be installed (a serious one - "LSI SAS 9207-8i") which I will definitely test, but I wanted to at least post this as I could not find anything related to this in google/duckduckgo/bing.
Thx for reading
Cheers |
|