View previous topic :: View next topic |
Author |
Message |
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Sat Jan 18, 2025 8:11 am Post subject: DMESG: silent gap in boot process |
|
|
I have upgraded my cpu from Ryzen 7 200 (8 cores) to Ryzen 9 5090X (16 cores).
On the old cpu, there was a 20 second 'silent' gap in the boot process. With the new cpu the gap has grown to around 50 seconds.
The relevant part of dmesg output is below. I can see that the initramfs has not started in this gap - since it appears in dmesg output further on.
It's not a problem, since the machine is not booted often. This post is really just to see if someone can help me understand what is happening in the silent gap.
Code: |
[ 2.972511] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.973476] ata6.00: supports DRM functions and may not be fully accessible
[ 2.973546] ata6.00: ATA-10: CT4000MX500SSD1, M3CR046, max UDMA/133
[ 2.973716] ata6.00: 7814037168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 2.978818] ata6.00: Features: Trust Dev-Sleep
[ 2.982669] ata6.00: supports DRM functions and may not be fully accessible
[ 2.987316] ata6.00: configured for UDMA/133
[ 2.991695] scsi 5:0:0:0: Direct-Access ATA CT4000MX500SSD1 046 PQ: 0 ANSI: 5
[ 2.996807] sd 5:0:0:0: Attached scsi generic sg3 type 0
[ 2.996924] sd 5:0:0:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[ 3.002163] scsi 12:0:0:0: CD-ROM ASUS DRW-24D5MT 1.00 PQ: 0 ANSI: 5
[ 3.003399] sd 5:0:0:0: [sdd] 4096-byte physical blocks
[ 3.010024] sd 5:0:0:0: [sdd] Write Protect is off
[ 3.013191] sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 3.013277] sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 3.016601] sd 5:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
[ 3.035962] sdd: sdd1 sdd2 sdd3 sdd4 sdd5 sdd6 sdd7
[ 3.041056] sd 5:0:0:0: [sdd] Attached SCSI disk
[ 3.044496] sr 12:0:0:0: [sr0] scsi3-mmc drive: 48x/12x writer dvd-ram cd/rw xa/form2 cdda tray
[ 3.047986] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 3.064101] sr 12:0:0:0: Attached scsi CD-ROM sr0
[ 3.064493] sr 12:0:0:0: Attached scsi generic sg4 type 5
[ 59.089841] Freeing initrd memory: 152336K
[ 59.112550] Segment Routing with IPv6
[ 59.115785] RPL Segment Routing with IPv6
[ 59.118994] In-situ OAM (IOAM) with IPv6
[ 59.122249] NET: Registered PF_PACKET protocol family
[ 59.125424] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 59.128860] Key type dns_resolver registered
[ 59.148008] microcode: Current revision: 0x0a201210
[ 59.151574] microcode: Updated early from: 0x0a201210
[ 59.162556] resctrl: L3 allocation detected
[ 59.166044] resctrl: MB allocation detected
[ 59.169468] resctrl: L3 monitoring detected
[ 59.173113] IPI shorthand broadcast: enabled
[ 59.199301] sched_clock: Marking stable (59198008329, 365972)->(59357103256, -158728955)
[ 59.204711] registered taskstats version 1
[ 59.208546] Loading compiled-in X.509 certificates
[ 59.222677] Loaded X.509 cert 'Build time autogenerated kernel key: 45c1a1b80a9ce23b52b48afddf797ce09e19d119'
[ 59.248983] Demotion targets for Node 0: null
[ 59.252526] Key type .fscrypt registered
[ 59.255781] Key type fscrypt-provisioning registered
[ 59.283140] cryptd: max_cpu_qlen set to 1000
[ 59.310134] AES CTR mode by8 optimization enabled
[ 59.388409] Key type encrypted registered
[ 59.391683] ima: Allocated hash algorithm: sha1
[ 59.511534] ima: No architecture policies found
[ 59.516050] clk: Disabling unused clocks
[ 59.519229] PM: genpd: Disabling unused power domains
[ 59.524946] Freeing unused decrypted memory: 2036K
[ 59.530199] Freeing unused kernel image (initmem) memory: 4272K
[ 59.533602] Write protecting the kernel read-only data: 28672k
[ 59.539931] Freeing unused kernel image (rodata/data gap) memory: 544K
[ 59.543090] Run /init as init process
[ 59.546262] with arguments:
[ 59.546268] /init
[ 59.546274] with environment:
[ 59.546280] HOME=/
[ 59.546286] TERM=linux
[ 59.546292] BOOT_IMAGE=/kernel-genkernel-x86_64-6.12.10-gentoo
[ 59.767871] dracut: Gentoo-2.17
[ 61.295477] dca service started, version 1.12.1
[ 61.331996] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
[ 61.345566] sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
[ 61.353116] sp5100-tco sp5100-tco: initialized. heartbeat=60 sec (nowayout=0)
[ 61.383704] r8169 0000:07:00.0: enabling device (0000 -> 0003)
[ 61.389073] xhci_hcd 0000:01:00.0: xHCI Host Controller
[ 61.389910] ccp 0000:0a:00.1: enabling device (0000 -> 0002)
[ 61.394499] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[ 61.401847] ccp 0000:0a:00.1: ccp: unable to access the device: you might be running a broken BIOS.
[ 61.406511] ccp 0000:0a:00.1: psp enabled
[ 61.411685] r8169 0000:07:00.0 eth0: RTL8168h/8111h, 98:25:4a:5e:dc:04, XID 541, IRQ 42
[ 61.416389] r8169 0000:07:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[ 61.449563] igb: Intel(R) Gigabit Ethernet Network Driver
[ 61.453057] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 61.456233] xhci_hcd 0000:01:00.0: hcc params 0x0200ef81 hci version 0x110 quirks 0x0000000000000010
[ 61.465594] xhci_hcd 0000:01:00.0: xHCI Host Controller
[ 61.469556] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[ 61.473741] xhci_hcd 0000:01:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[ 61.477663] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.12
|
|
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1859 Location: Germany
|
|
Back to top |
|
|
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Sun Jan 19, 2025 11:03 am Post subject: |
|
|
Thanks for the response, appreciated.
I *think* I can rule out random number seeding. rngd is started by the system init, which is a lot later in the process. Also, as an additional rabbit hole, I saw that the motherboard TPM entropy source was disabled.
Code: | [ 0.894947] tpm_crb MSFT0101:00: Disabling hwrng |
Apparently some ASUS issue that the kernel "fixed" by specifically disabling this ASUS TPM. (There is a kernel mail list thread: https://lore.kernel.org/lkml/CAHk-=wgK0Z-LrJGExwG=e=oxjD93LJhY3jMmi_2O2_Pkjf8Tsg@mail.gmail.com/) But as I say - a different rabbit hole.
The forum thread you referenced gave me some reading, and lead me to have a look at my initramfs.
Code: | /boot # ls -lh initramfs-genkernel-x86_64-6.12.10-gentoo
-rw------- 1 root root 149M Jan 17 15:59 initramfs-genkernel-x86_64-6.12.10-gentoo |
Code: | ian2 /boot # lsinitrd initramfs-genkernel-x86_64-6.12.10-gentoo
Image: initramfs-genkernel-x86_64-6.12.10-gentoo: 150M
========================================================================
Early CPIO image
========================================================================
drwxr-xr-x 3 root root 0 Jan 17 15:58 .
-rw-r--r-- 1 root root 2 Jan 17 15:58 early_cpio
drwxr-xr-x 3 root root 0 Jan 17 15:58 kernel
drwxr-xr-x 3 root root 0 Jan 17 15:58 kernel/x86
drwxr-xr-x 2 root root 0 Jan 17 15:58 kernel/x86/microcode
-rw-r--r-- 1 root root 147350 Jan 17 15:58 kernel/x86/microcode/AuthenticAMD.bin
========================================================================
Version: dracut-103
..etc.. |
From dmesg: Code: | [ 57.475087] Freeing initrd memory: 152336K |
Since the 'silent gap' in dmesg went from around 20 seconds to around 50 seconds when the only change was to a new CPU with double the number of cores, my hypothesis is that it is the early_cpio that is being discarded when this message is displayed.
The rest of the dracut initramfs is still in memory - because it gets executed a bit later on.
I'm going to have a look at the kernel config to see if I have enabled anything that isn't necessary. |
|
Back to top |
|
|
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Mon Jan 20, 2025 6:53 pm Post subject: |
|
|
Upgraded to kernel 6.13.0, and the dmesg 'silent gap' has gone back down to around the time it took on the previous cpu, as below.
I had to add configuration options DRM_SIMPLEDRM and SYSFB_SIMPLEFB to get the kernel to compile.
That, and any changes brought in by 6.13.0 itself were the only changes to my previous config.
I'm happier now - at least the boot is back to what it was
I'd still like to understand what is going on in that gap though.
Code: | [ 2.946411] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.947231] ata6.00: supports DRM functions and may not be fully accessible
[ 2.947263] ata6.00: ATA-10: CT4000MX500SSD1, M3CR046, max UDMA/133
[ 2.947343] ata6.00: 7814037168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 2.949906] ata6.00: Features: Trust Dev-Sleep
[ 2.952003] ata6.00: supports DRM functions and may not be fully accessible
[ 2.954760] ata6.00: configured for UDMA/133
[ 2.956607] scsi 5:0:0:0: Direct-Access ATA CT4000MX500SSD1 046 PQ: 0 ANSI: 5
[ 2.959044] sd 5:0:0:0: Attached scsi generic sg3 type 0
[ 2.959107] sd 5:0:0:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[ 2.962027] scsi 12:0:0:0: CD-ROM ASUS DRW-24D5MT 1.00 PQ: 0 ANSI: 5
[ 2.962804] sd 5:0:0:0: [sdd] 4096-byte physical blocks
[ 2.966553] sd 5:0:0:0: [sdd] Write Protect is off
[ 2.968408] sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 2.968442] sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.970387] sd 5:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
[ 3.000323] sr 12:0:0:0: [sr0] scsi3-mmc drive: 48x/12x writer dvd-ram cd/rw xa/form2 cdda tray
[ 3.002184] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 3.016586] sr 12:0:0:0: Attached scsi CD-ROM sr0
[ 3.016703] sr 12:0:0:0: Attached scsi generic sg4 type 5
[ 3.016760] sdd: sdd1 sdd2 sdd3 sdd4 sdd5 sdd6 sdd7
[ 3.020231] sd 5:0:0:0: [sdd] Attached SCSI disk
[ 26.479960] Freeing initrd memory: 154028K
[ 26.489890] Segment Routing with IPv6
[ 26.491378] RPL Segment Routing with IPv6
[ 26.492713] In-situ OAM (IOAM) with IPv6
[ 26.494137] NET: Registered PF_PACKET protocol family
[ 26.495933] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 26.497972] Key type dns_resolver registered
[ 26.506149] microcode: Current revision: 0x0a201210
[ 26.508459] microcode: Updated early from: 0x0a201210
[ 26.513150] resctrl: L3 allocation detected
[ 26.514536] resctrl: MB allocation detected
[ 26.515901] resctrl: L3 monitoring detected |
|
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1472 Location: Richmond Hill, Canada
|
Posted: Mon Jan 20, 2025 7:17 pm Post subject: |
|
|
ipic,
print out to dmesg buffer for kernel is controlled by printk kernel function which can be configured either at kernel command line level or change to /proc/sys/kernel/printk.
It is unlikely you have kernel command line setting to change printk, however it is likely you have initrd that does that.
So if you have initrd/initramfs, check the "init" script, see if it use dmesg -n 1 or sysctl -w kernel.printk=x x x x or did something direct to /proc/sys/kernel/printk |
|
Back to top |
|
|
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Mon Jan 20, 2025 7:34 pm Post subject: |
|
|
I may be wrong, but dracut's init process is run further down the dmesg (from the original post): Code: | [ 59.539931] Freeing unused kernel image (rodata/data gap) memory: 544K
[ 59.543090] Run /init as init process
[ 59.546262] with arguments:
[ 59.546268] /init
[ 59.546274] with environment:
[ 59.546280] HOME=/
[ 59.546286] TERM=linux
[ 59.546292] BOOT_IMAGE=/kernel-genkernel-x86_64-6.12.10-gentoo
[ 59.767871] dracut: Gentoo-2.17 |
I would assume that would not effect printk at the 2-3 second mark - am I wrong? |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1472 Location: Richmond Hill, Canada
|
Posted: Mon Jan 20, 2025 7:40 pm Post subject: |
|
|
Don't know. Then my guess is wrong. But in principle there is something disable printk function at that moment. You can try to enable early console print and make kernel logging at debug to see when that got disale. |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5358 Location: Bavaria
|
|
Back to top |
|
|
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Tue Jan 21, 2025 7:45 am Post subject: |
|
|
I checked.
The init script has no reference to /proc/sys/kernel/printk, dmesg or sysctl.
And it is run well after the gap - which is at 2-3 seconds, init script announces itself at 27 seconds plus. |
|
Back to top |
|
|
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Tue Jan 21, 2025 7:51 am Post subject: |
|
|
I changed my kernel boot command to this: Code: | [ 0.000000] Command line: BOOT_IMAGE=/kernel-genkernel-x86_64-6.13.0-gentoo root=/dev/mapper/vg01-root64 ro rd.shell rd.auto=1 rd.skipfsck rd.plymouth=0 plymouth.enable=0 loop.max_part=15 amd_pstate=passive loglevel=7 |
It did a bit more logging, but nothing that shows logging being disabled.
My current hypothesis is that the kernel is waiting for something in that gap, and times out. The motherboard has M2 'disk' slots that are empty, and has no way for me to disable the SATA links they present. |
|
Back to top |
|
|
ipic Guru
Joined: 29 Dec 2003 Posts: 430 Location: UK
|
Posted: Wed Jan 22, 2025 5:17 pm Post subject: |
|
|
I worked out that one of the SSD drives I have was failing. Not enough to fail outright, but enough that the BIOS was complaining about it at boot time. Also occasional drops to 1.5 Gig/second on the sata link.
While finding that out I was experimenting with different boot parameters and loading sequences, which got me to a kernel command line that looks like this:
Code: | [ 0.000000] Linux version 6.13.0-gentoo (root@ian2) (gcc (Gentoo 14.2.1_p20241221 p7) 14.2.1 20241221, GNU ld (Gentoo 2.43 p3) 2.43.1) #1 SMP PREEMPT_DYNAMIC Mon Jan 20 17:59:06 GMT 2025
[ 0.000000] Command line: BOOT_IMAGE=/kernel-genkernel-x86_64-6.13.0-gentoo root=/dev/mapper/vg01-root64 ro rd.shell rd.auto=1 rd.skipfsck rd.plymouth=0 plymouth.enable=0 rd.driver.pre=amdgpu loop.max_part=15 amd_pstate=active loglevel=7 |
Substantive changes are amd_pstate=active (from amd_pstate=passive) and rd.driver.pre=amdgpu - to get dracut to load amdgpu as soon as it starts.
Before I swapped the failing drive, these changes did not have a material effect.
After changing the failing drive for a new one, the dmesg gap now looks like this:
Code: | [ 2.368441] sd 4:0:0:0: [sdc] Attached SCSI disk
[ 2.810375] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.811125] ata6.00: supports DRM functions and may not be fully accessible
[ 2.811140] ata6.00: ATA-10: CT4000MX500SSD1, M3CR046, max UDMA/133
[ 2.811185] ata6.00: 7814037168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 2.812060] ata6.00: Features: Trust Dev-Sleep
[ 2.812195] ata6.00: supports DRM functions and may not be fully accessible
[ 2.813521] ata6.00: configured for UDMA/133
[ 2.815103] scsi 5:0:0:0: Direct-Access ATA CT4000MX500SSD1 046 PQ: 0 ANSI: 5
[ 2.815933] sd 5:0:0:0: Attached scsi generic sg3 type 0
[ 2.815944] sd 5:0:0:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[ 2.817097] sd 5:0:0:0: [sdd] 4096-byte physical blocks
[ 2.817595] sd 5:0:0:0: [sdd] Write Protect is off
[ 2.817602] scsi 12:0:0:0: CD-ROM ASUS DRW-24D5MT 1.00 PQ: 0 ANSI: 5
[ 2.818053] sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 2.818532] sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.819025] sd 5:0:0:0: [sdd] Preferred minimum I/O size 4096 bytes
[ 2.841759] sdd: sdd1 sdd2 sdd3 sdd4 sdd5 sdd6 sdd7
[ 2.842607] sd 5:0:0:0: [sdd] Attached SCSI disk
[ 2.853226] sr 12:0:0:0: [sr0] scsi3-mmc drive: 48x/12x writer dvd-ram cd/rw xa/form2 cdda tray
[ 2.853846] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 2.869435] sr 12:0:0:0: Attached scsi CD-ROM sr0
[ 2.869486] sr 12:0:0:0: Attached scsi generic sg4 type 5
[ 14.806229] Freeing initrd memory: 259804K
[ 14.810925] Segment Routing with IPv6
[ 14.811675] RPL Segment Routing with IPv6
[ 14.812167] In-situ OAM (IOAM) with IPv6
[ 14.812649] NET: Registered PF_PACKET protocol family
[ 14.813117] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 14.815374] Key type dns_resolver registered
[ 14.818936] microcode: Current revision: 0x0a201210
[ 14.819526] microcode: Updated early from: 0x0a201210
[ 14.822877] resctrl: L3 allocation detected
[ 14.823399] resctrl: MB allocation detected
[ 14.823854] resctrl: L3 monitoring detected
[ 14.824348] IPI shorthand broadcast: enabled
[ 14.828026] sched_clock: Marking stable (14827001329, 342967)->(14983445236, -156100940)
[ 14.828823] registered taskstats version 1
[ 14.829514] Loading compiled-in X.509 certificates
[ 14.831982] Loaded X.509 cert 'Build time autogenerated kernel key: 1bf47cadc31d989dcde14b9718ee7bde52948bfb'
[ 14.836369] Demotion targets for Node 0: null
[ 14.836885] Key type .fscrypt registered
[ 14.837341] Key type fscrypt-provisioning registered
[ 14.842092] cryptd: max_cpu_qlen set to 1000
[ 14.846706] AES CTR mode by8 optimization enabled
[ 14.859826] Key type encrypted registered
[ 14.860873] ima: Allocated hash algorithm: sha1
[ 14.951401] ima: No architecture policies found
[ 14.952536] clk: Disabling unused clocks
[ 14.953159] PM: genpd: Disabling unused power domains
[ 14.954234] Freeing unused decrypted memory: 2036K
[ 14.955473] Freeing unused kernel image (initmem) memory: 4288K
[ 14.956043] Write protecting the kernel read-only data: 28672k
[ 14.957026] Freeing unused kernel image (rodata/data gap) memory: 820K
[ 14.957587] Run /init as init process
[ 14.958128] with arguments:
[ 14.958129] /init
[ 14.958130] with environment:
[ 14.958131] HOME=/
[ 14.958132] TERM=linux
[ 14.958133] BOOT_IMAGE=/kernel-genkernel-x86_64-6.13.0-gentoo
[ 14.994135] dracut: Gentoo-2.17
[ 16.351543] [drm] amdgpu kernel modesetting enabled.
|
So 12 seconds of silence (before it was around 23 seconds) and a little bit faster to that point.
Since printk was raised. here are all the mentions of it in dmesg:
Code: | # dmesg | grep printk
[ 0.010511] printk: log_buf_len individual max cpu contribution: 131072 bytes
[ 0.010512] printk: log_buf_len total cpu_extra contributions: 4063232 bytes
[ 0.010512] printk: log_buf_len min size: 131072 bytes
[ 0.011962] printk: log buffer data + meta data: 4194304 + 14680064 = 18874368 bytes
[ 0.011963] printk: early log buf free: 115072(87%)
[ 0.034636] printk: legacy console [tty0] enabled
#
|
I think I've made it clear in my posts that this is not a problem. I am just trying to understand what is going on in that part of the boot process.
Since I can live with the current boot timings, I'll leave it at that until I find some more information.
My hypothesis remains that the kernel is 'waiting' for something (or several somethings) in the gap. All my SATA devices are hot-swappable, and some are not populated at boot. Along with the 'missing' M2, I think that may have a contribution.
Thanks for all the suggestions, they helped me dig into things in a more targetted way. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|