Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
A new issue with gentoo-sources-6.2.7 not waiting for nvme
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Mon Mar 20, 2023 5:50 pm    Post subject: A new issue with gentoo-sources-6.2.7 not waiting for nvme Reply with quote

With the advent of 6.2.7, I have noticed that often the kernel panics complaining that it can't find the boot drive.

But if I wait a few extra seconds in grub, it finds the drive just fine.

This was not a problem with 6.2.2.

A quick google did not turn up anything I guess I should scour the bug reports at kernel.org.

Anyone else seen this?
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
CooSee
Veteran
Veteran


Joined: 20 Nov 2004
Posts: 1477
Location: Earth

PostPosted: Mon Mar 20, 2023 8:20 pm    Post subject: Reply with quote

Quote:
But if I wait a few extra seconds in grub, it finds the drive just fine.

more info is needed - please wgetpaste your dmesg's, failed and non-failed and also your fstab - are you using UUID ?

have you compared the kernel .configs ?

8)
_________________
" Die Realität ist eine Illusion, die durch Mangel an ehrlicher Kommunikation entsteht "
---
" Der Mensch ist von Natur aus neugierig, was am Ende übrig bleibt ist die Gier "
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22928

PostPosted: Tue Mar 21, 2023 12:55 am    Post subject: Reply with quote

Which patch in v6.2.2..v6.2.7 introduced this problem?
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Wed Mar 22, 2023 6:14 am    Post subject: Reply with quote

This might have been an overheating problem. Maybe. If it happens again, I'll try to get more data. dmesg has nothing because the nvme doesn't work... I'll try to capture the screen with my cell phone.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1969

PostPosted: Wed Mar 22, 2023 12:47 pm    Post subject: Reply with quote

As the kernel assumes the root device is always ready when neither rootwait nor rootdelay command-line options are present, this could simply be that the async nature of NVMe initialization is not ready when run on an optimized kernel.

rootwait will wait forever until the device appears
rootdelay=<seconds> will wait the specified time before checking for the root device
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2556

PostPosted: Wed Mar 22, 2023 12:54 pm    Post subject: Reply with quote

grknight wrote:
As the kernel assumes the root device is always ready when neither rootwait nor rootdelay command-line options are present, this could simply be that the async nature of NVMe initialization is not ready when run on an optimized kernel.

rootwait will wait forever until the device appears
rootdelay=<seconds> will wait the specified time before checking for the root device


I don't understand how grub is already loaded, but the nvme is not initialized. Where was grub loaded from?

p.s. 6.2.x is not stable so expect it to have glitches.

Best Regards,
Georgi
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22928

PostPosted: Wed Mar 22, 2023 3:00 pm    Post subject: Reply with quote

Grub was loaded from the device that the firmware picked as the bootloader, which may or may not be the NVMe drive in question.

6.2.x is keyworded testing in Gentoo because upstream has not declared it a long term support series kernel (and likely will not do so, since 6.1.x was declared long term). That does not mean that the 6.2.x kernel is generally unstable and has glitches. As OP is now having difficulty reproducing this on demand, this is likely a race condition of some sort. It may be a latent bug that has been present for a long time and was made worse by an innocuous patch in v6.2.2..v6.2.7 or it may be a regression. Knowing the specific bad commit might help us determine which is the case, which is why I asked for the specific guilty patch.
Back to top
View user's profile Send private message
papu
l33t
l33t


Joined: 25 Jan 2008
Posts: 735
Location: Sota algun pi o alzina...

PostPosted: Wed Mar 22, 2023 3:18 pm    Post subject: Reply with quote

Quote:
~]$ eselect kernel list
Available kernel symlink targets:
[1] linux-6.2.7-gentoo
[2] linux-6.2.7_p1-zen *


i am using nvme(btrfs) and not have problems with 6.2.7
_________________
~amd64 & openrc --cpu 7700 non-x --ram 2x16GB --gpu RX 6600
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Wed Mar 22, 2023 5:47 pm    Post subject: Reply with quote

logrusx wrote:
grknight wrote:
As the kernel assumes the root device is always ready when neither rootwait nor rootdelay command-line options are present, this could simply be that the async nature of NVMe initialization is not ready when run on an optimized kernel.

rootwait will wait forever until the device appears
rootdelay=<seconds> will wait the specified time before checking for the root device


I don't understand how grub is already loaded, but the nvme is not initialized. Where was grub loaded from?

p.s. 6.2.x is not stable so expect it to have glitches.

Best Regards,
Georgi


This is a good point. In my case it loaded grub from the NVMe. How could it not be ready?
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1969

PostPosted: Wed Mar 22, 2023 5:56 pm    Post subject: Reply with quote

RayDude wrote:
logrusx wrote:
grknight wrote:
As the kernel assumes the root device is always ready when neither rootwait nor rootdelay command-line options are present, this could simply be that the async nature of NVMe initialization is not ready when run on an optimized kernel.

rootwait will wait forever until the device appears
rootdelay=<seconds> will wait the specified time before checking for the root device


I don't understand how grub is already loaded, but the nvme is not initialized. Where was grub loaded from?

p.s. 6.2.x is not stable so expect it to have glitches.

Best Regards,
Georgi


This is a good point. In my case it loaded grub from the NVMe. How could it not be ready?


Once grub starts the kernel, all is forgotten that was loaded in grub. It is up to the kernel after that.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Wed Mar 22, 2023 9:55 pm    Post subject: Reply with quote

I wonder if this is related to the other problem I have.

If power shuts off the server, nine times out of ten, the EFI partition will become corrupted and unrecognized by BIOS forcing me to boot a USB Flash drive, chrooting in and redoing the boot sector and grub settings.

But I checked and the boot sector and EFI directories do not actually change.

I suspect it's the BIOS, but this makes me wonder.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2556

PostPosted: Thu Mar 23, 2023 7:28 am    Post subject: Reply with quote

RayDude wrote:
I wonder if this is related to the other problem I have.

If power shuts off the server, nine times out of ten, the EFI partition will become corrupted and unrecognized by BIOS forcing me to boot a USB Flash drive, chrooting in and redoing the boot sector and grub settings.

But I checked and the boot sector and EFI directories do not actually change.

I suspect it's the BIOS, but this makes me wonder.


Do you have backups? I guess you get what I mean. Another thing is you don't need the ESP mounted. Mark it noauto in fstab.

Best Regards,
Georgi
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2191

PostPosted: Thu Mar 23, 2023 8:51 am    Post subject: Reply with quote

FWIW (and it's not much), 6.2.7 aand 6.2.8 work fine on my system booting via GRUB and NVMe.

How are you specifying the root device in grub.cfg?
_________________
Greybeard
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Sat Mar 25, 2023 3:14 pm    Post subject: Reply with quote

Sorry I haven't been interactive lately. I was laid off yesterday and found out on Wednesday. It's been an interesting week. I'll have plenty of time to debug this now, well, after I get my resume and linked-in up to date.

Update on the stability of this system. I was watching a video the day before yesterday and the system hard reset. Oddly though: it booted fine, no problem finding the boot sector. I had reboot issues happen in the past when I was running sensord, so I shut sensord off again to see if that stops it from happening.

There was no indication of why it reboot in syslog. The last thing logged is sensord output. Then a five minute gap. Then the reboot messages announcing sysloging being enabled.

This is the command I use to configure grub. I've been doing this for years:

Code:
grub-mkconfig -o /boot/grub/grub.cfg


It produces this:

Code:
        menuentry 'Gentoo GNU/Linux, with Linux 6.2.2-gentoo' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-6.2.2-gentoo-advanced-eb9ff80f-ba62-4e4a-bf86-c8c7b97320dc' {
                load_video
                set gfxpayload=keep
                insmod gzio
                insmod part_gpt
                insmod fat
                search --no-floppy --fs-uuid --set=root 2450-7E62
                echo    'Loading Linux 6.2.2-gentoo ...'
                linux   /vmlinuz-6.2.2-gentoo root=/dev/nvme0n1p4 ro  initcall_blacklist=acpi_cpufreq_init amd_pstate=passive amd_iommu=on sysrq_always_enabled=1 bdl_pos_adj=8,8
        }


I have a sneeking suspicion this is a BIOS issue with the MSI MPG B550 Motherboard I'm using. I checked to see if they have a new version and it's BETA and only changes AGESA versions. It doesn't mention any bug fixes.

I have serious issues with this BIOS. When I change anything in the BIOS and do a reset, the system reconfigures DRAM for twenty or more seconds and then hangs. I have to power it off by holding the power button for six seconds and then power it back on and it works fine after that.

I'm not sure that happens every time with the 5950X I'm running, but it did with the 3600 I was running a month back. I had the same problem last year with the 3700X and went to an old bios to fix it. But all this could be unrelated.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Sat Mar 25, 2023 3:52 pm    Post subject: Reply with quote

Update.

It reboot overnight. Either power glitched or it reset itself... PG&E has been crap lately so it could be either.

It was on the kernel boot screen, I took a picture and transcribed it:

Code:
md: ... autorun DONE.
/dev/root: Can't open blockdev
VFS: Cannot open root device "nvme0n1p4" or unknown-block(0,0): error -6
Please append a correct "root=" boot option: here are the available partitions:
103:00000  244198584 nvme0n1
  (driver?)
   103:00001          16384 nvme0n1p1 e338b...

   103:00002  244180992 nvme0n1p2 b996a...

102:00003 1953514584  nvme1n1
  (driver?)


Then it lists all the partitions on the secondary nvme which is windows because it's the only way to update my hard drive firmware.

It is not finding partition 3 (swap) or 4 (gentoo /root) from the gentoo nvme.

This is actually what the nvme looks like:

Code:
Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000P5SSD8                           
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 4EE79BFB-C34A-4E03-B2AD-64253B356F62

Device             Start        End    Sectors  Size Type
/dev/nvme0n1p1      2048     133119     131072   64M BIOS boot
/dev/nvme0n1p2    133120    2230271    2097152    1G EFI System
/dev/nvme0n1p3   2230272  136447999  134217728   64G Linux swap
/dev/nvme0n1p4 136448000 3907029134 3770581135  1.8T Linux filesystem


It looks like the partition information is being read incorrectly.

I can't imagine how.

Just so you guys get a sense of what this failure meant... This is what happened.

I found this thread and decided to update it.

I attempted to login to the server remotely to dig for information and it wasn't there, but it was powered.

I turned on it's monitor and saw it was stuck as shown above. linux just freezes there, not very good IMO.

I powered it off by holding the power button for six seconds.

I powered it back on and it immediately went into BIOS because the NVME was no longer bootable.

I booted a system rescue USB flash drive.

mounted /dev/nvme0n1p4 to /mnt/gentoo

ran my chroot.sh script

ran post_chroot.sh script

mounted /boot

reran grub --install and grub --makeconfig (or what ever they are called).

exit chroot

umount -l /mnt/gentoo

rebooted

Windows booted because BIOS decided that it should have priority over gentoo, even though the boot order had been previously: USB flash drive -> gentoo

held the power button for six seconds.

rebooted & entered BIOS, removed the windows boot option.

Booted gentoo and it came up fully functional.

This happens every time the power glitches and now because of the random reboots.

I've never spent so much money on something so crappy.

I think it's probably the motherboard / BIOS. I have no idea of how to prove that without buying a new one, and I shouldn't do that right now just in case.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54670
Location: 56N 3W

PostPosted: Sat Mar 25, 2023 4:20 pm    Post subject: Reply with quote

RayDude,

I suspect that the race is between nvme0n1 and nvme1n1 for the kernel device names.
Please share both partition tables.

Here's my theory.
The BIOS makes its own arrangements for reading the bootloader into RAM. EFI tends to use UUIDs of some sort, so kernel names don't matter. They don't exist yet.
If the names were swapped UUIDS still work.

That being true, don't use root=/dev/nvme0n1p4 in grub.cfg. Try root=PARTUUID=<partuuid_of_nvme0n1p4> blkid will tell you what it is.
Note that PARTUUID<>filesystem UUID. Do pick the right one.

Rewrite /etc/fstab in terms of the filesystem UUIDs. Now the kernel device names don't matter at all.

I suspect your Windows NVMe card has two partitions, which you see when booting fails. Does the e338b... and b996a... help identify them?
rootwait and root delay can't help here. Once the kernel has assigned the device names the wrong way round, it game over until reboot.

None of this addresses random reboots.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Sat Mar 25, 2023 5:24 pm    Post subject: Reply with quote

NeddySeagoon wrote:
RayDude,

I suspect that the race is between nvme0n1 and nvme1n1 for the kernel device names.
Please share both partition tables.

Here's my theory.
The BIOS makes its own arrangements for reading the bootloader into RAM. EFI tends to use UUIDs of some sort, so kernel names don't matter. They don't exist yet.
If the names were swapped UUIDS still work.

That being true, don't use root=/dev/nvme0n1p4 in grub.cfg. Try root=PARTUUID=<partuuid_of_nvme0n1p4> blkid will tell you what it is.
Note that PARTUUID<>filesystem UUID. Do pick the right one.

Rewrite /etc/fstab in terms of the filesystem UUIDs. Now the kernel device names don't matter at all.

I suspect your Windows NVMe card has two partitions, which you see when booting fails. Does the e338b... and b996a... help identify them?
rootwait and root delay can't help here. Once the kernel has assigned the device names the wrong way round, it game over until reboot.

None of this addresses random reboots.


Thanks so much Neddy!

This makes sense.

I use UUID in /etc/fstab.

I'm not sure how to make grub use it...


Update for windows nvme:

Code:
server ~ # fdisk -l /dev/nvme1n1
Disk /dev/nvme1n1: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 960 EVO 250GB               
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F5F89664-F55C-EE43-88F8-E76003CF18FB

Device         Start       End   Sectors   Size Type
/dev/nvme1n1p1  2048     34815     32768    16M Microsoft reserved
/dev/nvme1n1p2 34816 488396799 488361984 232.9G Microsoft basic data

_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Sat Mar 25, 2023 5:42 pm    Post subject: Reply with quote

grub is not using UUIDs.

I found this thread: https://forums.gentoo.org/viewtopic-t-1055328-start-0.html

which states:

A little digging into /etc/grub.d/10_linux (which appears to be used by grub-mkconfig) led me to this:

Code:
  elif test -z "${initramfs}" ; then
    # "UUID=" and "ZFS=" magic is parsed by initrd or initramfs.  Since there's
    # no initrd or builtin initramfs, it can't work here.

Which I interpret to mean that if you boot your kernel directly from the disk, not from initrd, then you can't use UUIDs.

I've never used an initrd...

And more than a dozen years ago, when I tried, I utterly failed to get it to work.


I'm thinking about trying the beta bios...
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54670
Location: 56N 3W

PostPosted: Sat Mar 25, 2023 6:05 pm    Post subject: Reply with quote

RayDude,

The kernel understands PARTUUID you can force grub to use that. No iinitrd required.
I think its in /etc/default/grub but I'm a syslinux user.

I do use grub on arm64 but everything is in LVM on RAID, so I must use an initrd anyway.
I tend to hand craft grub.cfg there too, which is very naughty of me. :)

--- edit ---

Code:
Device         Start       End   Sectors   Size Type
/dev/nvme1n1p1  2048     34815     32768    16M Microsoft reserved
/dev/nvme1n1p2 34816 488396799 488361984 232.9G Microsoft basic data


Oooohhh look. Two partitions :)
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Sat Mar 25, 2023 6:51 pm    Post subject: Reply with quote

NeddySeagoon wrote:
RayDude,

The kernel understands PARTUUID you can force grub to use that. No iinitrd required.
I think its in /etc/default/grub but I'm a syslinux user.

I do use grub on arm64 but everything is in LVM on RAID, so I must use an initrd anyway.
I tend to hand craft grub.cfg there too, which is very naughty of me. :)

--- edit ---

Code:
Device         Start       End   Sectors   Size Type
/dev/nvme1n1p1  2048     34815     32768    16M Microsoft reserved
/dev/nvme1n1p2 34816 488396799 488361984 232.9G Microsoft basic data


Oooohhh look. Two partitions :)


I noticed that as well...

I'll try to figure out grub and PARTUUID tomorrow. We're going on an adventure today!
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Sun Mar 26, 2023 5:49 pm    Post subject: Reply with quote

Grub can use PARTUUID:

From here: https://forums.gentoo.org/viewtopic-t-1049514-start-0.html

Add this to /etc/default/grub:

GRUB_DISABLE_LINUX_PARTUUID=false

Then rerun: grub-mkconfig -o /boot/grub/grub.cfg

Then PARTUUID will appear in: /boot/grub/grub.cfg, for example:

Code:
menuentry 'Gentoo GNU/Linux' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnuli>
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_gpt
        insmod fat
        search --no-floppy --fs-uuid --set=root 2450-7E62
        echo    'Loading Linux 6.2.7-gentoo ...'
        linux   /vmlinuz-6.2.7-gentoo root=PARTUUID=8e713e72-02a1-9b49-9b4c-c295620f36e0 ro  initcall_blacklist=acpi_cpufreq_init amd_pstate=passive amd_iommu=on sysrq_always_enabled=1 bdl_pos_adj=8,8


I'll let you know if I have issues still. If I don't I may forget and won't post which means it likely worked.

Thanks everyone for your help.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54670
Location: 56N 3W

PostPosted: Sun Mar 26, 2023 6:01 pm    Post subject: Reply with quote

RayDude,

When you boot, run
Code:
fdisk -l
and note which way round your NVMe comes up.
It won't matter now but you may catch it in the act. :)
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Mon Mar 27, 2023 12:13 am    Post subject: Reply with quote

NeddySeagoon wrote:
RayDude,

When you boot, run
Code:
fdisk -l
and note which way round your NVMe comes up.
It won't matter now but you may catch it in the act. :)


I wonder if it's because the drive needs fsck run on it. I noticed that when it was booting the kernel indicated that one of the drives needed fsck and it couldn't be run. I didn't see which one before the screen cleared and the GUI started.

Yep: here's the proof:

Code:
server ~ # fdisk -l
Disk /dev/nvme0n1: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 960 EVO 250GB               
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F5F89664-F55C-EE43-88F8-E76003CF18FB

Device         Start       End   Sectors   Size Type
/dev/nvme0n1p1  2048     34815     32768    16M Microsoft reserved
/dev/nvme0n1p2 34816 488396799 488361984 232.9G Microsoft basic data


Disk /dev/nvme1n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000P5SSD8                           
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 4EE79BFB-C34A-4E03-B2AD-64253B356F62

Device             Start        End    Sectors  Size Type
/dev/nvme1n1p1      2048     133119     131072   64M BIOS boot
/dev/nvme1n1p2    133120    2230271    2097152    1G EFI System
/dev/nvme1n1p3   2230272  136447999  134217728   64G Linux swap
/dev/nvme1n1p4 136448000 3907029134 3770581135  1.8T Linux filesystem


This must have started happening in kernel 6.2.7.

I just had another random reboot.

This is the last lines in syslog before reboot and the first line after reboot:

Mar 26 16:56:12 server root[18966]: ACPI event unhandled: button/right RIGHT 00000080 00000000 K
Mar 26 16:56:13 server root[18968]: ACPI event unhandled: button/right RIGHT 00000080 00000000 K
Mar 26 16:56:13 server root[18970]: ACPI event unhandled: button/right RIGHT 00000080 00000000 K
Mar 26 16:56:13 server root[18972]: ACPI event unhandled: button/right RIGHT 00000080 00000000 K
Mar 26 16:59:11 server syslog-ng[1979]: syslog-ng starting up; version='3.38.1'

I don't see how acpi can cause a reboot. I figure this has to be a hardware problem. The system was not even warm, I was just watching a youtube video.

I'm going to boot the old kernel and see if that makes the reboot go away.

If not, then I'm pretty sure the motherboard, cpu, or some other component is freaking out.

I mean proton can't cause random instability can it? I've been playing Hogwart's Legacy and it crashes so much...
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54670
Location: 56N 3W

PostPosted: Mon Mar 27, 2023 10:03 am    Post subject: Reply with quote

RayDude,

"I love it when a plan comes together" :)

Reboot random guess.

Do you have a watchdog timer that doesn't get 'patted', so it triggers a reboot?
Have a look in the BIOS set up and for testing only, turn it off if its there.
It should work. There is a whole kernel menu full of watchdog knobs.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Advocate
Advocate


Joined: 29 May 2004
Posts: 2088
Location: San Jose, CA

PostPosted: Tue Mar 28, 2023 6:17 am    Post subject: Reply with quote

NeddySeagoon wrote:
RayDude,

"I love it when a plan comes together" :)

Reboot random guess.

Do you have a watchdog timer that doesn't get 'patted', so it triggers a reboot?
Have a look in the BIOS set up and for testing only, turn it off if its there.
It should work. There is a whole kernel menu full of watchdog knobs.


Thanks Neddy. I don't think I have a watchdog. Do X86-64 boards come with them? I know the embedded arm systems I work with have them...

I just had a reboot watching a recording. The only indication in syslog is this:

Code:


That shows up just before the syslog startup message after the reboot.

I was running the old kernel, so I don't think it's the kernel.

I think it's hardware.

I'll let you in on a little secret. I just put a replacement CPU in for the old one that I had to RMA because of bad cache and I put the fans on the CPU heatsink on backwards so the fans were fighting case airflow. It got really hot inside and the system reboot due to heat before I caught the error.

I'm worried I killed something. If I did, I hope it's the motherboard, not the CPU.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum