Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How do I debug those openrc hangs on shutdown?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
klammerj
n00b
n00b


Joined: 19 Jan 2023
Posts: 16

PostPosted: Thu May 09, 2024 6:58 pm    Post subject: How do I debug those openrc hangs on shutdown? Reply with quote

Good evening,
Typically there's about a 10percent chance the box locks up.
often it's after or at localmount or sysklogd, so the kernel logs won't be much help.
I'd need console output. However, occasional modesets during shutdown make it difficult to impossible
to guess what services are still running and what are not.
I've done a fair bit of guessing and changing the need <whatever> clauses.
So far without much effect.
I seem to have version 0.48 of openrc installed.
Back to top
View user's profile Send private message
ian.au
l33t
l33t


Joined: 07 Apr 2011
Posts: 600
Location: Australia

PostPosted: Fri May 10, 2024 1:25 am    Post subject: Reply with quote

klammerj,

openrc stable is at 0.54 and 0.48 was replaced in January here, so I assume you're holding off updating your box? Why is that?

There isn't any useful info beyond that in your post, so everybody would be playing the same guessing game as you, your problem could be almost anything. If you've set your kernel for parallel execution the observed location of the hang might be quite misleading.
Code:
gw-01 ~ # grep -i parallel /usr/src/linux/.config
CONFIG_HOTPLUG_PARALLEL=y

I'd say the likely culprit is either network or ACPI configuration - if it's the latter there should be some failed services in dmesg. You can try
Code:
 gw-01 ~ # dmesg | grep -i error
and see if that helps.

If network, it will depend on your network management setup. If you're running any sort of server or connecting to remote fs via NFS or similar you should mention those.

You should try posting a description of your hardware, lspci etc, emerge --info, dmesg, network management, any remote fs services and kernel configs here for someone to take a look at if you really want help with this.
Back to top
View user's profile Send private message
klammerj
n00b
n00b


Joined: 19 Jan 2023
Posts: 16

PostPosted: Fri May 10, 2024 8:18 am    Post subject: Reply with quote

ian.au wrote:
klammerj,

openrc stable is at 0.54 and 0.48 was replaced in January here, so I assume you're holding off updating your box? Why is that?

Last time I did a larger emerge it took a week or so to fix everything that got hosed in the process.
I'll update that pkg and see what happens...

ian.au wrote:

There isn't any useful info beyond that in your post, so everybody would be playing the same guessing game as you, your problem could be almost anything. If you've set your kernel for parallel execution the observed location of the hang might be quite misleading.
Code:
gw-01 ~ # grep -i parallel /usr/src/linux/.config
CONFIG_HOTPLUG_PARALLEL=y


I'm using distribution kernel 6.1.60-gentoo-dist
grep -i parallel /usr/src/linux/.config
# Raw/parallel NAND flash controllers
# CONFIG_AD7606_IFACE_PARALLEL is not set


ian.au wrote:

I'd say the likely culprit is either network or ACPI configuration - if it's the latter there should be some failed services in dmesg. You can try
Code:
 gw-01 ~ # dmesg | grep -i error
and see if that helps.


dmesg | grep -i error
[ 0.268797] acpi PNP0A08:00: _OSC: platform retains control of PCIe features (AE_ERROR)
[ 1.013014] RAS: Correctable Errors collector initialized.
[ 22.396100] dracut: Mounting /dev/sda2 with -o defaults,noatime,lazytime,noiversion,inode_readahead_blks=2,delalloc,errors=remount-ro
[ 45.335372] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
ian.au wrote:


If network, it will depend on your network management setup. If you're running any sort of server or connecting to remote fs via NFS or similar you should mention those.

You should try posting a description of your hardware, lspci etc, emerge --info, dmesg, network management, any remote fs services and kernel configs here for someone to take a look at if you really want help with this.


Yes, there's network shares involved and kerberos and gssd.. but
I'd rather not divulge too much about my network setup.
I know it sez `noob' there on the left, but that's just coz I rarely post to fora. I've been using this stuff since about 1990..

What I'd like is just some way to print [123689] before every shutdown script to indicate what
services are still running(and a way to assign arbitrary numbers
to the individual service scripts). I'd be done debugging this in an hour,
instead of messing around for months now.
Back to top
View user's profile Send private message
klammerj
n00b
n00b


Joined: 19 Jan 2023
Posts: 16

PostPosted: Fri Jun 14, 2024 4:09 am    Post subject: Reply with quote

It worked well for a while, but today it got stuck again.
the rc.log has not recorded the event(as usual).
These were the lines visible on the screen:
Code:

display-manager           | * Sending signal 15 to PID 2541 ... [ ok ]
acpid                     | * Will stop /usr/sbin/acpid
acpid                     | * Will stop processes of `/usr/sbin/acpid'
termencoding              |termencoding              | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/termencoding stop
acpid                     | * Sending signal 15 to PID 2027 ...
display-manager-setup     |display-manager-setup     | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/display-manager-setup stop
sysklogd                  |sysklogd                  | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/sysklogd stop
sysklogd                  | * Stopping sysklogd ...
sysklogd                  | * Will stop /usr/sbin/syslogd
sysklogd                  | * Will stop PID 1894
netmount                  | * Failed to simply unmount filesystems
nfsclient                 |nfsclient                 | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/nfsclient stop
rpc.gssd                  |rpc.gssd                  | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/rpc.gssd stop
rpc.idmapd                |rpc.idmapd                | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/rpc.idmapd stop
rpc.gssd                  | * Stopping gssd ...
rpc.gssd                  | * Will stop /usr/sbin/rpc.gssd
rpc.gssd                  | * Will stop processes of `/usr/sbin/rpc.gssd'
rpc.gssd                  | * Sending signal 15 to PID 2378 ...
rpc.idmapd                | * Stopping idmapd ...
rpc.idmapd                | * Will stop /usr/sbin/rpc.idmapd
rpc.idmapd                | * Will stop processes of `/usr/sbin/rpc.idmapd'
rpc.idmapd                | * Sending signal 15 to PID 2377 ...
rpcbind                   |rpcbind                   | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/rpcbind stop
rpc.pipefs                |rpc.pipefs                | * Executing: /lib/rc/sh/openrc-run.sh /lib/rc/sh/openrc-run.sh /etc/init.d/rpc.pipefs stop
rpcbind                   | * Stopping rpcbind ...
rpcbind                   | * Will stop /sbin/rpcbind
rpc.pipefs                | * Unmounting RPC pipefs ...

(then nothing)

[Administrator edit: changed [quote] tags to [code] tags to preserve output layout. -Hu]
Back to top
View user's profile Send private message
C5ace
Guru
Guru


Joined: 23 Dec 2013
Posts: 475
Location: Brisbane, Australia

PostPosted: Fri Jun 14, 2024 9:49 am    Post subject: Reply with quote

Maybe reading this helps:

https://wiki.gentoo.org/wiki/Nfs-utils#OpenRC

Quote:
Unresponsiveness of the system

The system may become unresponsive during shutdown when the NFS client attempts to unmount exported directories after udev has stopped. To prevent this a local.d script can be used to forcibly unmount the exported directories during shutdown.

Create the file nfs.stop:
FILE /etc/local.d/nfs.stop

/bin/umount -a -f -t nfs,nfs4

Set the according file bits:
root #chmod a+x /etc/local.d/nfs.stop

_________________
Observation after 30 years working with computers:
All software has known and unknown bugs and vulnerabilities. Especially software written in complex, unstable and object oriented languages such as perl, python, C++, C#, Rust and the likes.
Back to top
View user's profile Send private message
klammerj
n00b
n00b


Joined: 19 Jan 2023
Posts: 16

PostPosted: Sun Jun 16, 2024 5:05 pm    Post subject: Reply with quote

Thank you. I had not done that yet...
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum