View previous topic :: View next topic |
Author |
Message |
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Sat Aug 19, 2023 2:03 pm Post subject: lto and pgo force reboot |
|
|
Hi,
I just installed gentoo on gigabyte ga-x58a-ud3r with xeon L5520 and 2*4Go.
The installation went well with this process :
https://paste.swordarmor.fr/Yo99
But after, when i want to add lto/pgo to portage/make.conf, after a while, the computer simply reboot.
USE="-help -selinux -systemd lto pgo minimal"
I tried with -j1 to eliminate a memory usage issue but nothing changed.
I tried with pgo only, same thing.
I did a memtest, 2 pass, 0 error.
I installed intel microcode but doesn't help.
i use a binary kernel :
calculus ~ # uname -a
Linux calculus 6.1.41-gentoo-dist #1 SMP PREEMPT_DYNAMIC Tue Jul 25 09:26:34 -00 2023 x86_64 Intel(R) Xeon(R) CPU L5520 @ 2.27GHz GenuineIntel GNU/Linux
How can i debug this ?
thanks |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22615
|
Posted: Sat Aug 19, 2023 3:15 pm Post subject: |
|
|
What is the idle CPU temperature of this system? How high does the temperature get during a build, before the reboot? What is the output of emerge --info? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54577 Location: 56N 3W
|
Posted: Sat Aug 19, 2023 3:31 pm Post subject: |
|
|
gizmo15_,
Welcome to Gentoo.
Don't do add Quote: | lto/pgo to portage/make.conf | , some things will not build with lto yet.
Worse, some things build but other things that depend on them won either won't build or or won't run.
I'm my opinion PGO is not worth the doubling of compile time.
You need a representative (of your use) set of tests to profile and subsequently optimise against.
Without that set of tests, PGO may optimise the wrong code paths, and even make things slower for you.
To find out where it breaks, edit /etc/rc.conf and set
Code: | # Set rc_interactive to "YES" and you'll be able to press the I key during
# boot so you can choose to start specific services. Set to "NO" to disable
# this feature. This feature is automatically disabled if rc_parallel is
# set to YES.
rc_interactive="YES" | near the top of the file.
When you boot press the 'I' key and the startup sequence will ask before it starts a service.
Identify which service startup causes the reboot.
If it never gets this far, its either your kernel or initrd. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Sat Aug 19, 2023 5:21 pm Post subject: |
|
|
Thanks for your replies!
Hu : max 55°, it was my first idea
emerge --info :
Code: |
calculus ~ # emerge --info
Portage 3.0.49 (python 3.11.4-final-0, default/linux/amd64/17.1, gcc-12, glibc-2.37-r3, 6.1.41-gentoo-dist x86_64)
=================================================================
System uname: Linux-6.1.41-gentoo-dist-x86_64-Intel-R-_Xeon-R-_CPU_L5520_@_2.27GHz-with-glibc2.37
KiB Mem: 8124604 total, 7844084 free
KiB Swap: 2097148 total, 2097148 free
Timestamp of repository gentoo: Fri, 18 Aug 2023 06:00:01 +0000
Head commit of repository gentoo: 282069f8658c651cb1eca5bf270ead09dd876c44
sh bash 5.1_p16-r6
ld GNU ld (Gentoo 2.40 p5) 2.40.0
app-misc/pax-utils: 1.3.5::gentoo
app-shells/bash: 5.1_p16-r6::gentoo
dev-lang/perl: 5.36.1-r3::gentoo
dev-lang/python: 3.11.4::gentoo
dev-util/cmake: 3.26.5-r1::gentoo
dev-util/meson: 1.1.1::gentoo
sys-apps/baselayout: 2.13-r1::gentoo
sys-apps/openrc: 0.47.1::gentoo
sys-apps/sandbox: 2.37::gentoo
sys-devel/autoconf: 2.71-r6::gentoo
sys-devel/automake: 1.16.5-r1::gentoo
sys-devel/binutils: 2.40-r5::gentoo
sys-devel/binutils-config: 5.5::gentoo
sys-devel/gcc: 12.3.1_p20230526::gentoo
sys-devel/gcc-config: 2.11::gentoo
sys-devel/libtool: 2.4.7-r1::gentoo
sys-devel/make: 4.4.1-r1::gentoo
sys-kernel/linux-headers: 6.1::gentoo (virtual/os-headers)
sys-libs/glibc: 2.37-r3::gentoo
Repositories:
gentoo
location: /var/db/repos/gentoo
sync-type: rsync
sync-uri: rsync://rsync.gentoo.org/gentoo-portage
priority: -1000
volatile: False
sync-rsync-extra-opts:
sync-rsync-verify-jobs: 1
sync-rsync-verify-metamanifest: yes
sync-rsync-verify-max-age: 24
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE @BINARY-REDISTRIBUTABLE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -mtune=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -mtune=native -O2 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-march=native -mtune=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -mtune=native -O2 -pipe"
GENTOO_MIRRORS="http://ftp.free.fr/mirrors/ftp.gentoo.org/"
LANG="fr_FR.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
LINGUAS="fr fr_FR en en_GB"
MAKEOPTS="-j3"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="acl amd64 bzip2 cli crypt dri fortran gdbm iconv ipv6 libtirpc minimal multilib ncurses nls nptl openmp pam pcre readline seccomp split-usr ssl test-rust unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2021" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 popcnt sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput" KERNEL="linux" L10N="fr fr-FR en en-GB" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LLVM_TARGETS="X86" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-1" POSTGRES_TARGETS="postgres15" PYTHON_SINGLE_TARGET="python3_11" PYTHON_TARGETS="python3_11" RUBY_TARGETS="ruby31" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset: ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
|
NeddySeagoon : so for a basic server like this, i don't need those entries ?
I will add this option, just for my curiosity ^^
i forgot to say this : it happened when i launch this command : emerge --ask --update --deep --changed-use --keep-going @world (after adding pgo to make.conf) |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54577 Location: 56N 3W
|
Posted: Sat Aug 19, 2023 6:07 pm Post subject: |
|
|
gizmo15_,
Is it repeatable in the same place?
Did you build something with lto that built but is broken?
If it not repeatable, I suspect a hardware issue. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Sun Aug 20, 2023 5:44 am Post subject: |
|
|
Yes it's repeatable
the build is ok for binutils and python but fail for gcc everytime, lto or pgo |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54577 Location: 56N 3W
|
Posted: Sun Aug 20, 2023 9:17 am Post subject: |
|
|
gizmo15_,
Please provide a full build log on a pastebin site.
Just because things build with lto does not mean that they will be useful.
The gcc build may depend on something that is broken with lto.
My lto exclude list may be useful.
There are many others. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Sun Aug 20, 2023 4:19 pm Post subject: |
|
|
the log before rebooting :
https://paste.swordarmor.fr/bFuI
to have this i did :
emerge --update --deep --changed-use --keep-going @world > bla.txt 2>&1
with this make.conf :
Code: |
# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
COMMON_FLAGS="-march=native -mtune=native -O2 -pipe"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
MAKEOPTS="-j2"
# NOTE: This stage was built with the bindist Use flag enabled
# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C
GENTOO_MIRRORS="http://ftp.free.fr/mirrors/ftp.gentoo.org/"
LINGUAS="fr fr_FR en en_GB"
L10N="fr fr-FR en en-GB"
LLVM_TARGETS="X86"
ACCEPT_LICENSE="-* @FREE @BINARY-REDISTRIBUTABLE"
USE="-help -selinux -systemd lto minimal"
|
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54577 Location: 56N 3W
|
Posted: Sun Aug 20, 2023 4:58 pm Post subject: |
|
|
gizmo15_,
The log is truncated but there are no errors. The last tine is
Code: | make[10]: Entering directory '/var/tmp/portage/sys-devel/gcc-12.3.1_p20230526/work/build/x86_64-pc-linux-gnu/32/libstdc++-v3/src/c++11'
/bin/sh ../../libtool --tag CXX --tag disable-shared --mode=compile /var/tmp/portage/sys-devel/gcc-12.3.1_p20230526/work/build/./gcc/xgcc -shared-libgcc -B/var/tmp/portage/sys-devel/gcc-12.3.1_p20230526/work/build/./gcc -nostdinc++ -L/var/tmp/portage/sys-devel/gcc-12.3.1_p20230526/work/build/x86_64-pc-linux-gnu/32/libstdc++-v3/src -L/var/tmp/portage/sys-devel/gcc-12.3.1_p20230526/work/build/x86_64-pc-linux-gnu/32/libstdc+
|
That looks like hardware. e.g. thermal or low battery shutdowns.
To avoid a thermal shutdown, try MAKEOPTS="-j1" if you are not on that already. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Sun Aug 20, 2023 6:52 pm Post subject: |
|
|
i already tried with -j1 but same thing.
The output is truncated because the computer reboot.
i updated my installation in -j4 without lto/pgo and no errors. Those options can push so much more on the hardware to probably have a crash ? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54577 Location: 56N 3W
|
Posted: Sun Aug 20, 2023 8:28 pm Post subject: |
|
|
gizmo15_
Both lto and pgo require more RAM and more time.
They are also more CPU intensive. Its possible that could push a marginal thermal solution over the edge. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Mon Aug 21, 2023 9:07 am Post subject: |
|
|
ack, i will dig into this side
thanks for the help! |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2398
|
Posted: Mon Aug 21, 2023 10:18 am Post subject: |
|
|
gizmo15_ wrote: | Those options can push so much more on the hardware to probably have a crash ? |
That's unlikely. More likely the optimization itself caused a bug or two to revel themselves. Problems + PGO/LTO => in vast majority of the cases the problem is in (the software not intended to support) PGO and/or LTO.
Best Regards,
Georgi |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22615
|
Posted: Mon Aug 21, 2023 12:22 pm Post subject: |
|
|
logrusx: I disagree. While LTO has been known to miscompile software in a way that it crashes at runtime, it should not be able to cause the computer to spontaneously reboot. The only ways that happens are either a hardware problem or a serious kernel problem. The kernel problem could be a miscompiled kernel, but as OP describes it, the kernel is stable even when building "bad" packages, as long as LTO and PGO are not used. PGO should be even safer. It is known to require substantial extra time, typically at full CPU load, but I cannot recall any reports of software that was miscompiled as a result of following the profile generated by PGO. If OP's cooling is inadequate, it is possible that quick builds, done without LTO and PGO, finish before the system can overheat, but that the extra time spent on PGO could keep the CPU at full load long enough to overload the cooling system and trigger a thermal shutdown.
55C does seem too cold for a thermal shutdown.
OP: when using PGO, how long elapses between when you start the emerge and when the computer spontaneously reboots? How variable is this delay? |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Mon Aug 21, 2023 2:02 pm Post subject: |
|
|
3 or 4 hours before spontaneously reboot |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22615
|
Posted: Mon Aug 21, 2023 3:07 pm Post subject: |
|
|
What package you building that needs 3-4 hours, then reboots at an indeterminate time during that build? Spontaneous non-deterministic reboots are characteristic of hardware failure. If your system were miscompiled by LTO or PGO, then the timing should be more predictable. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2398
|
Posted: Mon Aug 21, 2023 4:54 pm Post subject: |
|
|
Hu wrote: | logrusx: I disagree. While LTO has been known to miscompile software in a way that it crashes at runtime, it should not be able to cause the computer to spontaneously reboot. The only ways that happens are either a hardware problem or a serious kernel problem. The kernel problem could be a miscompiled kernel, but as OP describes it, the kernel is stable even when building "bad" packages, as long as LTO and PGO are not used. PGO should be even safer. It is known to require substantial extra time, typically at full CPU load, but I cannot recall any reports of software that was miscompiled as a result of following the profile generated by PGO. If OP's cooling is inadequate, it is possible that quick builds, done without LTO and PGO, finish before the system can overheat, but that the extra time spent on PGO could keep the CPU at full load long enough to overload the cooling system and trigger a thermal shutdown.
55C does seem too cold for a thermal shutdown.
OP: when using PGO, how long elapses between when you start the emerge and when the computer spontaneously reboots? How variable is this delay? |
You disagree with what? That optimizations is unlikely to push hardware to hardware crash? Anyways you might have a point in that it's either hardware or kernel problem. However I myself cannot think of what would be the kernel problem that only shows intermittently.
Regarding thermal shutdown, isn't there something called thermal throttling? Yes, it can't save the CPU of no cooling at all, but here we're considering inadequate cooling which is capable of cooling the CPU for extended periods of time, so thermal shutdown seems unlikely to me either, so that hypothesis might not be so strong. It can be ruled out with monitoring the temperature.
Having had experience with faulty memory, I would start memtest - the only easy and reliable hardware test I can think of.
I'd suggest running memtest overnight, afar at leas 8 passes. That could take a lot of time depending on the speed and size of the memory.
Best Regards,
Georgi |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Mon Aug 21, 2023 6:37 pm Post subject: |
|
|
Hu wrote: | What package you building that needs 3-4 hours, then reboots at an indeterminate time during that build? Spontaneous non-deterministic reboots are characteristic of hardware failure. If your system were miscompiled by LTO or PGO, then the timing should be more predictable. |
i only have 8Go of ram on this computer so when i wan't to compile with lto/pgo, i reduce to j2 so it take a lot of time.
logrusx: i already did a memtest, 2 pass, no errors. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2398
|
Posted: Tue Aug 22, 2023 4:16 am Post subject: |
|
|
gizmo15_ wrote: | Hu wrote: | What package you building that needs 3-4 hours, then reboots at an indeterminate time during that build? Spontaneous non-deterministic reboots are characteristic of hardware failure. If your system were miscompiled by LTO or PGO, then the timing should be more predictable. |
i only have 8Go of ram on this computer so when i wan't to compile with lto/pgo, i reduce to j2 so it take a lot of time.
logrusx: i already did a memtest, 2 pass, no errors. |
Two passes is not enough. My problems started appearing on 5th one, if you read the documentation they'll recommend at least 8 passes. Run it and hope the problem is in the memory modules, because it's easy to blacklist the bad addresses.
Best Regards,
Georgi |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Tue Aug 22, 2023 7:30 am Post subject: |
|
|
i launched memtest again and i'll wait 8 pass |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2398
|
Posted: Fri Aug 25, 2023 4:11 am Post subject: |
|
|
Wow, 53 is like a lot!
Well, I don't know what else you could do. Maybe inspect the MB for swollen capacitors, check how well your CPU is mounted, try running some benchmark monitoring the temperature and see if it's consistently reproducible...
But this is old hardware, there are plenty of things that might have worn out
Best Regards,
Georgi |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20484
|
Posted: Fri Aug 25, 2023 5:08 am Post subject: |
|
|
Is that just under an hour?
I had a memory channel (physical slot) problem that didn't show up for about 24 hours or so. I don't recall the exact number, but it ran overnight.
If no other avenues turn up a solution, consider trying an overnight run. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
gizmo15_ n00b
Joined: 19 Aug 2023 Posts: 11
|
Posted: Fri Aug 25, 2023 7:22 am Post subject: |
|
|
logrusx : i'll try with another cpu
pjp: 58 Hours running |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2398
|
Posted: Fri Aug 25, 2023 7:58 am Post subject: |
|
|
gizmo15_ wrote: | logrusx : i'll try with another cpu
pjp: 58 Hours running |
That's a good idea. Do that prior to anything else. In general - avoid conducting more than one experiment at once. If you do so you won't know which result to attribute to which experiment.
Best Regards,
Georgi |
|
Back to top |
|
|
|