Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
System randomly fails to boot (btrfs, dracut, race cond.)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Chris.B.
n00b
n00b


Joined: 17 Aug 2010
Posts: 48

PostPosted: Sat Mar 28, 2020 11:57 am    Post subject: System randomly fails to boot (btrfs, dracut, race cond.) Reply with quote

Hi,

some time ago I decided to go modern with BTRFS. The setup currently has two 1TB SSD drives in RAID1.
Code:
vetar ~ # btrfs filesystem show
Label: 'BTROOT'  uuid: 25ed3527-8c3d-4d87-88ab-f6fabf078899
   Total devices 2 FS bytes used 557.10GiB
   devid    1 size 953.87GiB used 561.01GiB path /dev/sda2
   devid    2 size 953.87GiB used 561.01GiB path /dev/sdd2


It works most of the time. I'd really love it to work all of the time ;)

The issue is that during boot there is some race condition. Some of the time one of the disks doesn't get up fast enough, system decides that it's dead and drops me to dracut console with the info that it will try to mount the filesystem when I exit the console. So I exit (ctrl-d) without touching anything at the console - and it boots correctly, 100% of the times.

I tried adding rootdelay=10 to /etc/default/grub and to
Code:
dracut --kernel-cmdline rootdelay=10
- none of those seem to work. Not that it doesn't help with the disk issue - it doesn't work at all. 10 seconds is a long time, I would notice if it worked.

Any idea how to solve this issue? I would prefer if the system waited for the disks instead of just delay / sleep, but if there is no way to wait, sleep will do.
I will gladly post any logs or additional info that might help.

relevant part of dmesg:
Code:

[    4.971500] sd 6:0:0:0: [sdd] Write Protect is off
[    4.977156] sd 1:0:0:0: [sdb] Attached SCSI disk
[    4.982193] sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[    4.982217] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    4.994933]  sdd: sdd1 sdd2
[    5.000459] sd 6:0:0:0: [sdd] Attached SCSI disk
[    5.276715] ata8: SATA link down (SStatus 0 SControl 300)
[    5.282748] Freeing unused kernel image memory: 1180K
[    5.288152] Write protecting the kernel read-only data: 18432k
[    5.294174] Freeing unused kernel image memory: 2008K
[    5.299748] Freeing unused kernel image memory: 540K
[    5.305090] rodata_test: all tests were successful
[    5.310392] Run /init as init process
[    5.356821] dracut: dracut-049
[    5.466948] systemd-udevd[311]: starting version 3.2.9
[    5.472281] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    5.477560] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    5.482765] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    5.488817] udevd[312]: starting eudev-3.2.9
[    5.527433] r8169 0000:09:00.0 enp9s0: renamed from eth0
[    5.547970] BTRFS: device label BTROOT devid 1 transid 81802 /dev/sda2
[    5.586986] random: fast init done
[    6.558175] BTRFS info (device sda2): disk space caching is enabled
[    6.563263] BTRFS info (device sda2): has skinny extents
[    6.568939] BTRFS error (device sda2): devid 2 uuid e1280267-94f4-4fb3-8dfb-6c93c4b63d6f is missing
[    6.574053] BTRFS error (device sda2): failed to read the system array: -2
[    6.587834] BTRFS error (device sda2): open_ctree failed
[    6.598798] dracut Warning: Failed to mount -t btrfs -o subvol=systemroot,ro,ro /dev/disk/by-uuid/25ed3527-8c3d-4d87-88ab-f6fabf078899 /sysroot
[    6.609558] dracut Warning: *** An error occurred during the file system check.
[    6.620167] dracut Warning: *** Dropping you to a shell; the system will try
[    6.630534] dracut Warning: *** to mount the filesystem(s), when you leave the shell.
[    6.650717] dracut Warning:
---<here I get out of the console>---
[    6.872963] random: crng init done
[    6.877619] random: 5 urandom warning(s) missed due to ratelimiting
[   36.190867] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   36.196874] ata7.00: configured for UDMA/133
[   36.217538] BTRFS: device label BTROOT devid 2 transid 81802 /dev/sdd2
[   38.846390] BTRFS info (device sda2): disk space caching is enabled
[   38.850609] BTRFS info (device sda2): has skinny extents
[   38.980517] BTRFS info (device sda2): enabling ssd optimizations
[   39.032682] dracut: Mounted root filesystem /dev/sda2
[   39.083562] dracut: Switching root
[   39.970220] udevd[1031]: starting version 3.2.9
[   39.995203] udevd[1031]: starting eudev-3.2.9
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3943
Location: Hamburg

PostPosted: Sat Mar 28, 2020 12:41 pm    Post subject: Reply with quote

If it is racy: You ensured that you compiled in the appropriate file system and device driver into the kernel rather than building them as modules, right?
Back to top
View user's profile Send private message
Chris.B.
n00b
n00b


Joined: 17 Aug 2010
Posts: 48

PostPosted: Sat Mar 28, 2020 12:48 pm    Post subject: Reply with quote

Yes, it's all in the kernel. I don't use modules much:
Code:
tuvok@vetar /lib/modules/4.9.34-gentoo $ find ./ | grep ko
./kernel/arch/x86/crypto/glue_helper.ko
./kernel/arch/x86/crypto/sha512-ssse3.ko
./kernel/arch/x86/crypto/twofish-avx-x86_64.ko
./kernel/arch/x86/crypto/twofish-x86_64-3way.ko
./kernel/arch/x86/crypto/twofish-x86_64.ko
./kernel/arch/x86/platform/intel/iosf_mbi.ko
./kernel/crypto/ablk_helper.ko
./kernel/crypto/cryptd.ko
./kernel/crypto/echainiv.ko
./kernel/crypto/lrw.ko
./kernel/crypto/pcbc.ko
./kernel/crypto/sha512_generic.ko
./kernel/crypto/twofish_common.ko
./kernel/crypto/twofish_generic.ko
./kernel/drivers/hwmon/asb100.ko
./kernel/drivers/thermal/x86_pkg_temp_thermal.ko
./kernel/drivers/video/backlight/lcd.ko
./kernel/fs/fuse/fuse.ko
./kernel/sound/usb/6fire/snd-usb-6fire.ko
./kernel/sound/usb/bcd2000/snd-bcd2000.ko
./kernel/sound/usb/caiaq/snd-usb-caiaq.ko
./kernel/sound/usb/hiface/snd-usb-hiface.ko
./kernel/sound/usb/line6/snd-usb-line6.ko
./kernel/sound/usb/line6/snd-usb-pod.ko
./kernel/sound/usb/line6/snd-usb-podhd.ko
./kernel/sound/usb/line6/snd-usb-toneport.ko
./kernel/sound/usb/line6/snd-usb-variax.ko
./kernel/sound/usb/misc/snd-ua101.ko
./kernel/sound/usb/usx2y/snd-usb-us122l.ko
./kernel/sound/usb/usx2y/snd-usb-usx2y.ko
./misc/vboxdrv.ko
./misc/vboxnetadp.ko
./misc/vboxnetflt.ko
./misc/vboxpci.ko
./video/nvidia-drm.ko
./video/nvidia-modeset.ko
./video/nvidia-uvm.ko
./video/nvidia.ko
tuvok@vetar /lib/modules/4.9.34-gentoo $ lsmod
Module                  Size  Used by
nvidia_drm             40960  4
nvidia_modeset       1077248  9 nvidia_drm
vboxpci                24576  0
vboxnetadp             28672  0
vboxnetflt             28672  0
vboxdrv               368640  4 vboxpci,vboxnetadp,vboxnetflt
nvidia              19939328  446 nvidia_modeset
Back to top
View user's profile Send private message
freke
Veteran
Veteran


Joined: 23 Jan 2003
Posts: 1051
Location: Somewhere in Denmark

PostPosted: Sat Mar 28, 2020 1:50 pm    Post subject: Reply with quote

I *think* rootwait and rootdelay only works for the first stage of booting (ie. for the initramfs - dracut?)

Similar (kernel rootwait/rootdelay ignored) problem here - https://stackoverflow.com/questions/14806294/linux-kernel-parameter-rootwait-being-ignored


I found this - don't know if it could be relevant (don't know dracut at all):
Code:
       rd.retry=<seconds>
           specify how long dracut should retry the initqueue to configure
           devices. The default is 30 seconds. After 2/3 of the time,
           degraded raids are force started. If you have hardware, which
           takes a very long time to announce its drives, you might want to
           extend this value.


http://man7.org/linux/man-pages/man7/dracut.cmdline.7.html
Back to top
View user's profile Send private message
Chris.B.
n00b
n00b


Joined: 17 Aug 2010
Posts: 48

PostPosted: Sat Mar 28, 2020 3:31 pm    Post subject: Reply with quote

freke wrote:
I *think* rootwait and rootdelay only works for the first stage of booting (ie. for the initramfs - dracut?)

Similar (kernel rootwait/rootdelay ignored) problem here - https://stackoverflow.com/questions/14806294/linux-kernel-parameter-rootwait-being-ignored


I found this - don't know if it could be relevant (don't know dracut at all):
Code:
       rd.retry=<seconds>
           specify how long dracut should retry the initqueue to configure
           devices. The default is 30 seconds. After 2/3 of the time,
           degraded raids are force started. If you have hardware, which
           takes a very long time to announce its drives, you might want to
           extend this value.


http://man7.org/linux/man-pages/man7/dracut.cmdline.7.html


I'll try that in the evening (I can't reboot right now), but I doubt that can help - according to the quote above it should spend at least 20 seconds initializing devices, it doesn't look like waiting that long for anything in the boot process. But I'll try anyway and report back.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum