SAS HBA, random drive reset under load, io scheduler issue ?

sdauth · l33t Joined: 19 Sep 2018 Posts: 650 Location: Ásgarðr

Hello,
I have a server with 8 hdd attached to a SAS HBA (Dell Perc H310 with IT firmware), no raid is used : for each hard disk, a filesystem and a mountpoint.
When the system is under high load with lot of r/w, I have a disk (not the same each time !) which reset itself for no reason.
This time it was /dev/sdg, when it happened, I was running a (long) smart test on 4 drives and started a copy to an other disk. /dev/sdg was totally idle..
Other than that, the airflow is good, temp are ok and I already replaced the mini sas and power cables..

sdauth · l33t Joined: 19 Sep 2018 Posts: 650 Location: Ásgarðr

I recompiled my kernel with BFQ built-in and added this udev rule :
/etc/udev/rules.d/60-ioschedulers.rules

sdauth · l33t Joined: 19 Sep 2018 Posts: 650 Location: Ásgarðr

Switching to bfq io scheduler instead of mq-deadline seems to does wonders. Almost 48hours uptime with concurrent smart test (1 drive still not finished..) and various rsync (local and remote) copy on different hdd and not a single drive reset yet.