Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Massive slowdowns when handling many small files
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Jimini
l33t
l33t


Joined: 31 Oct 2006
Posts: 610
Location: Germany

PostPosted: Sat Nov 23, 2024 4:33 am    Post subject: Massive slowdowns when handling many small files Reply with quote

Dear all,

when handling many small files with gentoo-sources-6.1.*, everything works fine.
With gentoo-sources-6.6.*, the transfer gets slower after a while. A reboot helps for a few moments, until the performance problem occurs again. I have to add, that a normal reboot does not work then, since one of the processes cannot be stopped or killed. Thus, I have to hard reset the system, e.g. with a magic SysRQ.

It does not make any difference how the files are transferred - for testing purposes, I transfer the extracted kernel sources (/usr/src/linux/) from A to B:
- local rsync from SSD to HDD RAID6
- rsync from a NFS share to a local folder
- cp within the same RAID6
- rsync within the same RAID6
- extracting a tarball on the RAID6
For now, a file transfer within the same local SSD seems to work fine.

I have tested a few kernel versions, with the following results:
gentoo-sources-6.1.19 -> OK
gentoo-sources-6.1.28 -> OK
gentoo-sources-6.1.67 -> OK
gentoo-sources-6.6.13 -> problems
gentoo-sources-6.6.62 -> problems
(I did not configure and compile every new kernel from scratch. For new kernel versions, I reuse the old config and, after a "make syncconfig", I compile the new kernel.)

The CPU load looks fine, so does the memory consumption.I assume, that some I/O stuff may be the root cause here, but I honestly have no idea how to verify this.

Since I have a workaround by simply using an older kernel, it does not look like a hardware (memory, CPU, RAID6) problem to me.

I would be thankful for any input for further testing.

Kind regards
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Sat Nov 23, 2024 9:26 am    Post subject: Re: Massive slowdowns when handling many small files Reply with quote

Jimini wrote:
[...](I did not configure and compile every new kernel from scratch. For new kernel versions, I reuse the old config and, after a "make syncconfig", I compile the new kernel.) [...]

Dont use "make syncconfig" ... use "make oldconfig". See also:
https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_kernel_configuration#What_is_.22make_oldconfig.22_.3F
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1502
Location: Richmond Hill, Canada

PostPosted: Sat Nov 23, 2024 2:55 pm    Post subject: Reply with quote

So is only destination RAID6 show this symptom? does reverse destination to SSD show same symptom?

It is worthy to know it is write bound or read bound.
Back to top
View user's profile Send private message
wjb
l33t
l33t


Joined: 10 Jul 2005
Posts: 645
Location: Fife, Scotland

PostPosted: Sat Nov 23, 2024 5:15 pm    Post subject: Reply with quote

Code:
$ grep -a3 syncconfig /usr/src/linux/scripts/kconfig/Makefile
...
#
# Note:
#  syncconfig has become an internal implementation detail and is now
#  deprecated for external use
Back to top
View user's profile Send private message
Jimini
l33t
l33t


Joined: 31 Oct 2006
Posts: 610
Location: Germany

PostPosted: Sun Nov 24, 2024 2:58 am    Post subject: Reply with quote

pietinger, pingtoo & wjb, thank you for your replies.

Regarding syncconfig, it seems like I did not get the memo ;)
I have adapted my scripts and rebuilt the kernel on this particular machine, thank you.

I ran a few tests:
- unpacking a tarball on the SSD took 1m43s
- unpacking the same tarball on the RAID6 was canceled by me after there was no progress for 75min
- unpacking the same tarball from the RAID6 to the SSD took 1min10s
- unpacking the same tarball from the SSD to the RAID6 was canceled by me after 38mins

Writing to the RAID6 seems to lead to problems.
This is not the case in general - I can copy a 25G file to the RAID6, which takes about 7min (about 60MB/s).

In addition, I had a look at the processes on the system, while I was trying to reboot. "kworker/u8:0+flush-253:0" was at about 100% CPU load.

Maybe I should add share some information on my setup:
- Intel Core i3-9100 CPU @ 3.60GHz, 4 GB memory
- software RAID6, consisting of 7 HDDs (Seagate Exos X16, Toshiba MG08ACA16TE)
- on it is a LUKS2 container with aes-xts-plain64 as cipher
- ext4 is the filesystem:
dumpe2fs 1.47.1 (20-May-2024)
Filesystem volume name: <none>
Last mounted on: /home/share
Filesystem UUID: 94c5bf33-25cb-4eb0-9081-6e638dfabe67
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
[...]

Please bear in mind that with gentoo-sources-6.1.*, it works without problems :)
Any idea what I could test yet?

Kind regards
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Sun Nov 24, 2024 11:21 am    Post subject: Reply with quote

I could take a look at your kernel configuration ... preferably together with all three data mentioned here:
https://wiki.gentoo.org/wiki/User:Pietinger/Overview_of_System_Information
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Jimini
l33t
l33t


Joined: 31 Oct 2006
Posts: 610
Location: Germany

PostPosted: Sun Nov 24, 2024 11:52 am    Post subject: Reply with quote

pietinger, thanks a lot for your offer.

Here you are:
emerge --info: https://paste.gentoo.zip/BhPUxQ13
dmesg: https://paste.gentoo.zip/0e1IKBml
lspci -nnk: https://paste.gentoo.zip/fZ9hw8oR (the command throws "lspci: Unable to load libkmod resources: error -2", but I guess this should not be the root cause for my problem)
kernel config: https://paste.gentoo.zip/q2mJlIVk

Again, I just tried to unpack the kernel files right on the RAID6, just for having a look at the dmesg output. But the only new line is the following:
[ 525.705770] EXT4-fs (dm-0): mounted filesystem 94c5bf33-25cb-4eb0-9081-6e638dfabe67 r/w with ordered data mode. Quota mode: none.


From today on, I will be on a trip until next sunday, so I will not be able to respond until then. So please take your time for having a look into the shared information.

Best regards and thanks in advance
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Sun Nov 24, 2024 5:24 pm    Post subject: Reply with quote

Jimini,

you have as profile selected: default/linux/amd64/23.0/split-usr/no-multilib/hardened
... but you dont have a hardened kernel, you know that?

Your dmesg is not complete: I am missing the beginning; it starts at timestamp [ 1.477459]

I see you have not much RAM: 3.685.320 byte

It is an (old) Intel machine: Intel-R-_Core-TM-_i3-9100 and CONFIG_MCORE2=y

You have disabled module support (like me) and get the harmless error from lspci (I also have) because the package sys-apps/pciutils (which contains lspci) was emerged with Use-flag "kmod". If you want to get rid of the error message, you have to switch off this use flag for this package (but as I said, this is absolutely harmless).

I am missing some important options in your kernel, e.g. CONFIG_X86_X2APIC=y because you miss CONFIG_IRQ_REMAP ... because you switched it off completely:
Code:
# CONFIG_INTEL_IOMMU is not set

(see more here: https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel#IOMMU )

Maybe you will need also:
Code:
# CONFIG_X86_INTEL_LPSS is not set
... AND one of them:
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_INTEL_LPSS_ACPI is not set
# CONFIG_MFD_INTEL_LPSS_PCI is not set

(see more here: https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel#Platform_support )

I always recommend to boot with our Gentoo LiveCD and do a "lsmod" and notice all loaded modules. Maybe you see then:
Code:
# CONFIG_INTEL_IDMA64 is not set

is loaded instead of
Code:
CONFIG_INTEL_IOATDMA=y


Some additional notes:
Code:
1.
CONFIG_PREEMPT_NONE=y
2.
# CONFIG_SCHED_CORE is not set
3.
CONFIG_TRACEPOINTS=y
4.
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
3.
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
5.
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
6.
# CONFIG_INTEL_IDLE is not set
7.
CONFIG_IA32_EMULATION=y
8.
CONFIG_BLK_DEV_THROTTLING=y
3.
CONFIG_BLK_DEBUG_FS=y
9.
CONFIG_EXTRA_FIRMWARE=""
10.
CONFIG_I2C_ALI1535=y
CONFIG_I2C_ALI1563=y
CONFIG_I2C_ALI15X3=y
CONFIG_I2C_AMD756=y
CONFIG_I2C_AMD756_S4882=y
CONFIG_I2C_AMD8111=y
CONFIG_I2C_I801=y
CONFIG_I2C_PIIX4=y
CONFIG_I2C_NFORCE2=y
CONFIG_I2C_SIS5595=y
CONFIG_I2C_SIS630=y
CONFIG_I2C_SIS96X=y
CONFIG_I2C_VIA=y
CONFIG_I2C_VIAPRO=y

CONFIG_I2C_OCORES=y
CONFIG_I2C_SIMTEC=y
CONFIG_I2C_TAOS_EVM=y
CONFIG_I2C_TINY_USB=y

# CONFIG_PINCTRL is not set
11.
CONFIG_SECURITY_SELINUX=y
CONFIG_INTEGRITY=y
3.
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_BLK_DEV_IO_TRACE=y

1. Tells me it was configured as server machine - Okay.
2. I recommend to enable it
3. Every debugging and tracing cost performance. I always recommend to switch it off (and use it only when investigating a specific problem)
4. Maybe you need it (the default value is: enabled)
5. I recommend to enable it
6. I recommend to enable it
7. You have a No-multilib system (like me) and can/should disable it
8. Really ? I would disable it.
9. I see you are using an initramfs (probably even built by yourself) - are you sure that the microcode and all firmware files are included? If not, include them in your kernel directly (https://wiki.gentoo.org/wiki/User:Pietinger/Experimental/Manual_Configuring_Current_Kernel#CPU_Microcode ).
10. I think you will need only CONFIG_I2C_I801=y ... if at all ... because i2c requires pinctrl, which you have not activated. Do you have i2c hardware in this machine?
11. If you dont use it ... disable it.
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Jimini
l33t
l33t


Joined: 31 Oct 2006
Posts: 610
Location: Germany

PostPosted: Mon Feb 10, 2025 6:15 am    Post subject: Reply with quote

Good morning altogether,

first of all, thanks a lot @pietinger for your helpful input. Unfortunately, the system had some hardware issues and I had to replace the core components. Since then, I could not reproduce the problem anymore - but I will test it from time to time.

I will definitely have a look and your hints regarding the kernel config! Some of my configs are more than 15 years old, it is definitely possible that there is come old "misconfig heritage" ;)

Best regards
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Mon Feb 10, 2025 1:49 pm    Post subject: Reply with quote

Jimini wrote:
first of all, thanks a lot @pietinger for your helpful input. [...]

You are very Welcome! :D
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Ralphred
l33t
l33t


Joined: 31 Dec 2013
Posts: 721

PostPosted: Mon Feb 10, 2025 5:44 pm    Post subject: Reply with quote

Jimini wrote:
Some of my configs are more than 15 years old, it is definitely possible that there is come old "misconfig heritage" ;)
I know that feeling, if pietinger.net offered a "drop your lspci output and kernel .config here for expert feedback, only €5.99 a year" I'd subscribe :P
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Mon Feb 10, 2025 6:49 pm    Post subject: Reply with quote

Ralphred wrote:
I know that feeling, if pietinger.net offered a "drop your lspci output and kernel .config here for expert feedback, only €5.99 a year" I'd subscribe :P

:lol:

Thanks a lot, Ralphred ... but I do it without costs ... especially for people who translate a german text into english (I dont have forgot that) ;-)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Jimini
l33t
l33t


Joined: 31 Oct 2006
Posts: 610
Location: Germany

PostPosted: Sun Feb 16, 2025 10:19 am    Post subject: Reply with quote

Good morning,

as written last week, I have moved the system to newer hardware (AMD Ryzen 3 PRO 4350G, 16G RAM).

pietinger, I have now reconfigured my kernel (6.6.74) according to your hints - except for the Intel CPU related options, of course.
EXTRA_FIRMWARE is set to "amd-ucode/microcode_amd_fam17h.bin" (according to https://wiki.gentoo.org/wiki/Ryzen#Firmware).

I have now run a few tests again. Hereby I unpacked a tar.gz archive of /usr/src/linux/ - once on the SSD and once on the RAID6.
SSD: 0m13.652s
RAID6: 0m15.606s

Looks good so far, so I repeated the test:
SSD: 0m11.276s
RAID6: the process slows down drastically after a few seconds. While unpacking the archive on the RAID6 takes about one second for one single file to unpack, the result on the SSD is unchanged with 11.282s.

Looking into "top", the output is interesting:

Code:
top - 11:17:38 up 6 min,  2 users,  load average: 2.16, 1.52, 0.69
Tasks: 180 total,   2 running, 178 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 11.9 sy,  0.0 ni, 75.0 id, 13.1 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15373.5 total,   8416.2 free,   1284.8 used,   5847.0 buff/cache
MiB Swap:   6144.0 total,   6144.0 free,      0.0 used.  14088.7 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  124 root      20   0       0      0      0 R 100.0   0.0   4:23.74 kworker/u+
    1 root      20   0    2472   1536   1536 S   0.0   0.0   0:01.00 init
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
[...]
 2277 root      20   0    5876   2560   2432 D   0.0   0.0   0:00.00 tar


Conclusions:
- We obviously have no CPU bottleneck here.
- According to "iotop", the RAID6 is accessed only by the tar process, at a few kilobytes per second.
- We have a "kworker" process, which fully loads one CPU core.
- On the old hardware, I could only reproduce the problem only on a kernel above version 6.1. With a 6.1 kernel, the problem did not occur. On this hardware, I have no bootable 6.1 kernel at hand, maybe I could test this later on.

dmesg: https://paste.gentoo.zip/MINvlkpL
emerge --info: https://paste.gentoo.zip/uVonOhJD
lspci -nnk: https://paste.gentoo.zip/Ke01mXdR
kernel config: https://paste.gentoo.zip/OdLLzFfD

I just stumbled upon https://forums.gentoo.org/viewtopic-p-8812952.html?sid=36f52eac3e35466d194085fe02a0b89c - but the mentioned VM settings seem not to have any effect on my system, the problem persists.

...any ideas, what else to try?

Best regards
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Sun Feb 16, 2025 5:20 pm    Post subject: Reply with quote

Jimini wrote:
[...] ...any ideas, what else to try?

Hmm ... frankly, I don't know what the problem is ... if all the following doesn't help, then my last idea would be to use a different kernel version (it doesn't have to be the old 6.1. but take 6.12 - because that will soon be stable for Gentoo anyway) ... I would like to mention the following:

1) I'm sure you're aware of this yourself:
Code:
[   10.742056] Warning: unable to open an initial console.
<=>
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636] (rev da)
   Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636]
<=>
# CONFIG_DRM_AMDGPU is not set

=> https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_kernel_configuration#Driver_needs_Firmware (because you have a monolithic kernel) => https://wiki.gentoo.org/wiki/AMDGPU

2) I guess this is/was the result of an hard reset:
Code:
[   12.327137] EXT4-fs (sdh2): INFO: recovery required on readonly filesystem
[   13.732730] md/raid:md0: not clean -- starting background reconstruction


3) Please try to enable these both:
Code:
# CONFIG_CHR_DEV_SG is not set
# CONFIG_SCSI_CONSTANTS is not set

(This is not relevant for your problem but should be enabled also: # CONFIG_I2C_PIIX4 is not set)

4) Do you run this kernel on bare metal or in a VM? If it is a VM then enable:
Code:
# CONFIG_UDMABUF is not set


5) This is a shoot in the dark ... change it from 0 to 1:
Code:
CONFIG_SATA_MOBILE_LPM_POLICY=0



Internal note:

Code:
[    0.000000] Linux version 6.6.74-gentoo (root@share2) (gcc (Gentoo Hardened 14.2.1_p20241221 p7) 14.2.1 20241221, GNU ld (Gentoo 2.43 p3) 2.43.1) #3 SMP PREEMPT_DYNAMIC Sun Feb 16 10:04:59 CET 2025
[    1.901510] Memory: 15729892K/16121688K available (16384K kernel code, 847K rwdata, 3828K rodata, 11804K init, 2124K bss, 391536K reserved, 0K cma-reserved)
[    2.100062] smpboot: CPU0: AMD Ryzen 3 PRO 4350G with Radeon Graphics (family: 0x17, model: 0x60, stepping: 0x1)
[    2.107025] smp: Brought up 1 node, 8 CPUs

[    2.885892] smapi::smapi_init, ERROR invalid usSmapiID
[    2.885959] mwave: tp3780i::tp3780I_InitializeBoardData: Error: SMAPI is not available on this machine
[    2.886051] mwave: mwavedd::mwave_init: Error: Failed to initialize board data
[    2.886127] mwave: mwavedd::mwave_init: Error: Failed to initialize

[    2.886557] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.

[    2.893957] megasas: 07.725.01.00-rc1
[    2.894039] mpt3sas version 43.100.00.00 loaded
[    2.894293] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (15730200 kB)

[    3.019930] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[    3.020039] mpt2sas_cm0: MSI-X vectors supported: 1
[    3.020116]     no of cores: 8, max_msix_vectors: -1
[    3.020193] mpt2sas_cm0:  0 1 1
[    3.020353] mpt2sas_cm0: High IOPs queues : disabled
[    3.020431] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 32
[    3.020510] mpt2sas_cm0: iomem(0x00000000fcf40000), mapped(0x000000006fcf1c8f), size(65536)
[    3.020611] mpt2sas_cm0: ioport(0x000000000000f000), size(256)
[    3.075080] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[    3.102826] mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
[    3.103033] mpt2sas_cm0: request pool(0x00000000e7d4f7f1) - dma(0x103300000): depth(3492), frame_size(128), pool_size(436 kB)
[    3.108544] mpt2sas_cm0: sense pool(0x00000000f8b86087) - dma(0x103a80000): depth(3367), element_size(96), pool_size (315 kB)
[    3.108721] mpt2sas_cm0: reply pool(0x0000000058ad46ef) - dma(0x103b00000): depth(3556), frame_size(128), pool_size(444 kB)
[    3.108827] mpt2sas_cm0: config page(0x000000004fb93642) - dma(0x103a40000): size(512)
[    3.108908] mpt2sas_cm0: Allocated physical memory: size(7579 kB)
[    3.108979] mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
[    3.109068] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[    3.153821] mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03)
[    3.153919] mpt2sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[    3.154131] scsi host0: Fusion MPT SAS Host
[    3.154765] mpt2sas_cm0: sending port enable !!

[   10.730207] pktgen: Packet Generator for packet performance testing. Version: 2.75

[   10.730525] GACT probability NOT on
[   10.730598] Mirror/redirect action on

01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
   Subsystem: Dell 6Gbps SAS HBA Adapter [1028:1f1c]
   Kernel driver in use: mpt3sas

CPU microcode update is correct/working:
Code:
[   10.733824] microcode: microcode updated early to new patch_level=0x0860010d
[   10.733930] microcode: CPU2: patch_level=0x0860010d
[   10.733931] microcode: CPU0: patch_level=0x0860010d
...

_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Jimini
l33t
l33t


Joined: 31 Oct 2006
Posts: 610
Location: Germany

PostPosted: Mon Feb 17, 2025 12:41 pm    Post subject: Reply with quote

pietinger wrote:
1) I'm sure you're aware of this yourself:
Code:
[   10.742056] Warning: unable to open an initial console.
<=>
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636] (rev da)
   Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636]
<=>
# CONFIG_DRM_AMDGPU is not set

=> https://wiki.gentoo.org/wiki/User:Pietinger/Tutorials/Manual_kernel_configuration#Driver_needs_Firmware (because you have a monolithic kernel) => https://wiki.gentoo.org/wiki/AMDGPU


Yeah, but this should not be a problem here, since the system is headless :)

2) I guess this is/was the result of an hard reset:
Code:
[   12.327137] EXT4-fs (sdh2): INFO: recovery required on readonly filesystem
[   13.732730] md/raid:md0: not clean -- starting background reconstruction

Exactly - when I reproduce the problem, I have a kworker process which prevents the system from shutting down. I then need to force a reboot, e.g. with a sysrq (sh -c "echo b > /proc/sysrq-trigger").

Quote:
3) Please try to enable these both:
Code:
# CONFIG_CHR_DEV_SG is not set
# CONFIG_SCSI_CONSTANTS is not set

(This is not relevant for your problem but should be enabled also: # CONFIG_I2C_PIIX4 is not set)

Done - unfortunately without effect.

Quote:
4) Do you run this kernel on bare metal or in a VM? If it is a VM then enable:
Code:
# CONFIG_UDMABUF is not set

The system is running on bare metal.

Quote:
5) This is a shoot in the dark ... change it from 0 to 1:
Code:
CONFIG_SATA_MOBILE_LPM_POLICY=0

Done - unfortunately without effect.




I ran my tests again, just for comparison I took the 6.1.127 kernel. I used my current config from 6.6.74 and adapted it with "make oldconfig". Then I changed my little script a bit - now it deletes the files after they have been decompressed.
Here are the results:

6.1.127:
Code:
test 1 - decompressing tar.gz on SSD:

real    0m11.678s
user    0m8.851s
sys     0m8.750s

test 2 - decompressing tar.gz on RAID6

real    2m54.919s
user    0m11.804s
sys     0m12.612s

test 1 - decompressing tar.gz on SSD:

real    0m9.814s
user    0m8.406s
sys     0m6.630s

test 2 - decompressing tar.gz on RAID6

real    0m9.547s
user    0m8.637s
sys     0m6.358s

test 1 - decompressing tar.gz on SSD:

real    0m9.641s
user    0m8.519s
sys     0m6.517s

test 2 - decompressing tar.gz on RAID6

real    0m9.412s
user    0m8.563s
sys     0m6.311s


6.6.74:
Code:
test 1 - decompressing tar.gz on SSD:

real    0m12.832s
user    0m8.466s
sys     0m5.534s

test 2 - decompressing tar.gz on RAID6

real    0m10.045s
user    0m8.492s
sys     0m5.479s

test 1 - decompressing tar.gz on SSD:

real    0m9.347s
user    0m8.394s
sys     0m5.407s

test 2 - decompressing tar.gz on RAID6

real    0m13.264s
user    0m8.365s
sys     0m5.512s

test 1 - decompressing tar.gz on SSD:

real    0m9.314s
user    0m8.519s
sys     0m5.315s

test 2 - decompressing tar.gz on RAID6

real    0m9.812s
user    0m8.445s
sys     0m5.316s


As you can see, I could not reproduce the problem anymore. On both kernels, the decompression of the archive worked without problems. In total, I ran 10 tests on the 6.6.74 kernel and the problem did not occur anymore.
The only difference now was, that while testing before, the target directory was overwritten. Now the extracted directory was deleted, so the process had to write all files every time.

So I let my script overwrite the target directory again:
- It does not matter which kernel I boot - just decompressing and writing the files into an empty directory works fine.
- When the target directory with all of its contents already exists, it looks like following:

6.1.127 & SSD: fast as expected
6.1.127 & RAID6: takes a lot longer (5mins and more)
6.6.74 & SSD: fast as expected
6.6.74 & RAID6: takes forever (I interrupted the task after ~40mins)

I cannot tell, where the difference between "write data to an empty dir" and "overwrite existing data" exactly is, but perhaps I am now one little step closer to the root cause...

Kind regards
Jimini
_________________
"The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents." (H.P. Lovecraft: The Call of Cthulhu)
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5403
Location: Bavaria

PostPosted: Mon Feb 17, 2025 3:02 pm    Post subject: Reply with quote

Jimini wrote:
[...] I ran my tests again, just for comparison [...]
I cannot tell, where the difference between "write data to an empty dir" and "overwrite existing data" exactly is, but perhaps I am now one little step closer to the root cause...

I would be very interested to see what happens with 6.12.14 (*). Only one test would be necessary - this one:
Jimini wrote:
6.6.74 & RAID6: takes forever (I interrupted the task after ~40mins)

(Of course with the same config as before; updated with “make oldconfig”; answers to the questions are here: https://wiki.gentoo.org/wiki/User:Pietinger/Experimental )

*) If ... IF ... there is no problem with 6.12 ... then I would think there is a kernel regression in 6.6 8O


P.S.: If you test 6.12 please dont forget to update also package "linux-headers" ;-)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum