Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
IO Performance loss after kernel upgrades from 5.15 to 6.x
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Tue Jan 02, 2024 3:13 am    Post subject: IO Performance loss after kernel upgrades from 5.15 to 6.x Reply with quote

Has anybody tracked the IO performance across kernel upgrades?

I seem to have run into a situation where the performance is halved in the worst case. I am comparing three LTS kernels:

5.15.145
6.1.68
6.6.8 (purported next LTS)

I have a RAIDZ1 of 4 NVME SSDs. Same ZFS version 2.2.2. The only thing I change is the kernel version, rest everything is same (same machine, same ZFS version, same pool, same userspace). The kernel config has been brought forward with 'make oldconfig'. So, in almost all aspects, it should be identical for common config elements between 5.15, 6.1 and 6.6.

Zpool scrub times:

5.15.145 - 10m
6.1.68 - 12m
6.6.8 - 20m

Attached kdiskmark shots from 5.15.145 vs 6.6.8 (I did not run for 6.1.68 ). I ran it on top of the same ZFS dataset where compression is disabled. All numbers are lower for 6.6.8.

5.15.145 https://www.phoronix.com/forums/filedata/fetch?id=1431236
6.6.8 https://www.phoronix.com/forums/filedata/fetch?id=1431237

I was expecting 6.6 to be higher numbers because of all the IO enhancements work that has gone in since 5.15.

Is there something obvious that I missed and I need to tune specifically to double up the sequential IO speed in 6.6.8?
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Tue Jan 02, 2024 3:14 am    Post subject: Reply with quote

BTW, its not a ZFS regression because the raw IO speed with 'dd' in non-cached mode has fallen as well.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Wed Jan 03, 2024 11:39 pm    Post subject: Reply with quote

Is it possibly because of this: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ca9a836ff53

Looks like we changed the CPU scheduler implementation completely in 6.6. I wonder how this affects the disk IO. Is there some sort of throttling going on because of this?
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5353
Location: Bavaria

PostPosted: Wed Jan 10, 2024 2:33 pm    Post subject: Reply with quote

Moderator note: I have split some posts and moved them to Discussion about CUDA -- pietinger
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5353
Location: Bavaria

PostPosted: Wed Jan 10, 2024 2:45 pm    Post subject: Reply with quote

With kernel version 6.6 we have a new task scheduler: EEVDF
( https://kernelnewbies.org/LinuxChanges#Linux_6.6.New_task_scheduler:_EEVDF )
... but its hard to believe this can be the cause for your IO problems ... ? ... maybe ZFS needs some adjustments to this ?
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
gentoo_ram
Guru
Guru


Joined: 25 Oct 2007
Posts: 513
Location: San Diego, California USA

PostPosted: Thu Jan 11, 2024 1:22 am    Post subject: Reply with quote

I believe the default I/O schedulers have changed in the various kernel versions. That might have something to do with it. If you can boot your different kernels then maybe compare the I/O scheduler settings. Try:

Code:
lsblk -t
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Tue Jan 16, 2024 6:21 am    Post subject: Reply with quote

gentoo_ram wrote:
I believe the default I/O schedulers have changed in the various kernel versions. That might have something to do with it. If you can boot your different kernels then maybe compare the I/O scheduler settings. Try:

Code:
lsblk -t
do you have the output from 6.6.x?

The nvme drives use the "none" scheduler and hdd drives use mq-deadline in 5.15. I don't think that has changed.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Mon Feb 05, 2024 4:51 pm    Post subject: Reply with quote

This regression is present in 6.7.3 as well. The system boots in 45seconds while 5.15 boots in about 25s. The zpool scrub over a pool of NVME drives is running at half the speed compared to 5.15. So, definitely some major issues with kernels beyond 5.15.

I am surprised that no one else is noticing any slowdowns over 2 LTS kernels (6.1 and 6.6). Maybe folks don't test performance that often.
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2660

PostPosted: Mon Feb 05, 2024 5:41 pm    Post subject: Reply with quote

I recently switched to 6.6 and is seems like there's some degradation in responsiveness of the desktop which could be attributed to the CPU scheduler. I remember at one time I used to use Con Kolivas patches to overcome that, then it was not necessary anymore.

Best Regards,
Georgi
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Mon Feb 05, 2024 6:17 pm    Post subject: Reply with quote

The worst part here is that once the IO starts to build up because of slowness in IO, the desktop starts to suffer: mouse stutters, menus don't open, applications don't fire when you click on them. Its a complete shutshow.

I can not reproduce any of that with 5.15.

I think they made that CPU scheduler change in a hurry and did not add an option to stay with older code via a config option. If they did that, we could at least confirm that that indeed is the issue.
Back to top
View user's profile Send private message
Saundersx
Apprentice
Apprentice


Joined: 11 Apr 2005
Posts: 294

PostPosted: Mon Feb 05, 2024 8:28 pm    Post subject: Reply with quote

Just for giggles try this, I had a regression back in 5.19.6 and this made it tolerable.

Code:
echo 500 > /proc/sys/vm/dirty_expire_centisecs
echo $((1024*1024*256)) > /proc/sys/vm/dirty_background_bytes
echo $((1024*1024*512)) > /proc/sys/vm/dirty_bytes
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Mon Feb 05, 2024 10:43 pm    Post subject: Reply with quote

Saundersx wrote:
Just for giggles try this, I had a regression back in 5.19.6 and this made it tolerable.

Code:
echo 500 > /proc/sys/vm/dirty_expire_centisecs
echo $((1024*1024*256)) > /proc/sys/vm/dirty_background_bytes
echo $((1024*1024*512)) > /proc/sys/vm/dirty_bytes
Thanks, Saundersx! I know about these settings very well and they are tuned perfectly for my system. I never use a Linux machine without VM settings tuned properly. Otherwise, the system is intolerable when IO starts.

The difference in behavior in my case is seen with one change: the kernel version.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum