View previous topic :: View next topic |
Author |
Message |
sam_ Developer
Joined: 14 Aug 2020 Posts: 2038
|
Posted: Tue Jun 25, 2024 7:17 am Post subject: |
|
|
The ICE is weird in that it's abnormal to get so many within a single package even if something is pretty badly broken (but a real bug).
Death within GCC's hash tables or garbage collector almost always means HW failure (usually bad RAM, but possibly overclocking-induced or overheating). How long did you run memtest for? Is your CPU overclocked? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2526
|
Posted: Tue Jun 25, 2024 8:28 am Post subject: |
|
|
sam_ wrote: | The ICE is weird in that it's abnormal to get so many within a single package even if something is pretty badly broken (but a real bug).
Death within GCC's hash tables or garbage collector almost always means HW failure (usually bad RAM, but possibly overclocking-induced or overheating). How long did you run memtest for? Is your CPU overclocked? |
It's pretty consistent though. I have some experience with bad ram and it's not like that at all. Actually back in the days I had faulty CPU too and it's never the same error. It just hangs in many and very different situations, sometimes just while idling.
I'm leaning towards a combination of a bug and circumstances.
Best Regards,
Georgi |
|
Back to top |
|
|
sam_ Developer
Joined: 14 Aug 2020 Posts: 2038
|
Posted: Tue Jun 25, 2024 9:30 am Post subject: |
|
|
I've seen bad RAM manifest exactly like this. As I said, the hash table in GCC (and the GC) are sensitive to any corruption and hence easily trip over it. (That's why I said it. I deal with real ICEs all the time, and I also get plenty of bogus reports caused by HW.)
Anyway, if it isn't (though I'm sceptical), I'll need preprocessed source from running the failing command manually in the builddir and appending -save-temps. It'll create a .ii file. See also https://wiki.gentoo.org/wiki/GCC_ICE_reporting_guide. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2526
|
Posted: Tue Jun 25, 2024 5:30 pm Post subject: |
|
|
sam_ wrote: | I've seen bad RAM manifest exactly like this. As I said, the hash table in GCC (and the GC) are sensitive to any corruption and hence easily trip over it. (That's why I said it. I deal with real ICEs all the time, and I also get plenty of bogus reports caused by HW.)
Anyway, if it isn't (though I'm sceptical), I'll need preprocessed source from running the failing command manually in the builddir and appending -save-temps. It'll create a .ii file. See also https://wiki.gentoo.org/wiki/GCC_ICE_reporting_guide. |
Yes, it actually makes sense. Those things consume the most memory and the chance of breaking exactly there is significantly higher.
cfgauss wrote: | fedeliallalinea wrote: | Code: | internal compiler error: Segmentation fault |
Usually this error is related to a hardware issue. |
I used smartctl to test my NVMEs (no hard disk), memtest86+ to test my memory, and s-tui to test my CPU and the hardware passes these tests.
How would I go about checking for software problems? |
Smartctl is not a test actually. I don't remember if it could schedule a read/write test on the nvme. I remember I could test my HDD from the BIOS of my old ThinkPad, but I don't know about NVMEs.
Memtest86+ should be run for at least 8 passes (which I usually mention and I don't know why I didn't ask about it in this thread) and I remember somebody here on the forums had it run even more to discover memory errors. Like 12th pass or something. I think we later discovered the user was using an overclocking profile for the memory, counting on some "warranties" and claims made by the memory vendor. Could you run it overnight and make sure you're not using any overclocking memory profile, so it can really test the memory very hard? It would be the easiest to replace a faulty memory module, compared to hunting for the issue anywhere else.
What tests did you run on the NVME and for how long did you run memory tests? Did you happen to remember which pass they run up to?
Best Regards,
Georgi |
|
Back to top |
|
|
cfgauss l33t
Joined: 18 May 2005 Posts: 726 Location: USA
|
Posted: Wed Jun 26, 2024 5:15 pm Post subject: |
|
|
sam_ wrote: | Is your CPU overclocked? |
No.
I tested my 128 GB of RAM by running memtest86+ from grub overnight. This resulted in 4 passes and 0 errors. I'm away from my box for a week and only have ssh access. When I get back, for how many passes should I run memtest86+?
I'll then report the results here. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2526
|
Posted: Wed Jun 26, 2024 5:49 pm Post subject: |
|
|
cfgauss wrote: | When I get back, for how many passes should I run memtest86+?
|
At least 8 is recommended in the documentation, if I remember correctly.
Best Regards,
Georgi |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20521
|
Posted: Fri Jun 28, 2024 12:42 am Post subject: |
|
|
cfgauss wrote: | sam_ wrote: | Is your CPU overclocked? |
No.
I tested my 128 GB of RAM by running memtest86+ from grub overnight. This resulted in 4 passes and 0 errors. I'm away from my box for a week and only have ssh access. When I get back, for how many passes should I run memtest86+?
I'll then report the results here. | I had a memory issue on the 2nd channel of a motherboard. The only obvious symptom was that the system would reboot while trying to compile a hardened kernel and only a hardened kernel. There were no other apparent problems. That was a long time ago, but I had to run memtest for quite a while. Then when it finally found something, I had to move the physical RAM around to identify that the errors were associated with a particular slot on the motherboard and not the memory itself. I _think_ I recall that overnight wasn't long enough. I'd imagine it takes longer to go through maybe 16 times the amount of RAM I had. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
cfgauss l33t
Joined: 18 May 2005 Posts: 726 Location: USA
|
Posted: Sat Jul 06, 2024 9:05 pm Post subject: |
|
|
sam_ wrote: | Death within GCC's hash tables or garbage collector almost always means HW failure (usually bad RAM, but possibly overclocking-induced or overheating). How long did you run memtest for? |
I ran memtest86+ for about three days (12 complete passes) with no error. This bug report describes a user who has the same error message compiling dev-qt/qtdeclarative as I do. He's using gcc 14.1.1 with an AMD Ryzen Threadripper 1950X. I'm using the same gcc version but with AMD Ryzen Threadripper 2950X. He was able to compile with CLANG as am I.
Does this suggest recompiling gcc or some other approach? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2526
|
Posted: Sun Jul 07, 2024 6:40 pm Post subject: |
|
|
Finally I emerged gcc-14.1.1_p20240622 and qtdeclarative-6.7.2 didn't break with it.
cfgauss wrote: | sam_ wrote: | Death within GCC's hash tables or garbage collector almost always means HW failure (usually bad RAM, but possibly overclocking-induced or overheating). How long did you run memtest for? |
I ran memtest86+ for about three days (12 complete passes) with no error. This bug report describes a user who has the same error message compiling dev-qt/qtdeclarative as I do. He's using gcc 14.1.1 with an AMD Ryzen Threadripper 1950X. I'm using the same gcc version but with AMD Ryzen Threadripper 2950X. He was able to compile with CLANG as am I.
Does this suggest recompiling gcc or some other approach? |
I don't think so. If you read comment#4 it redirects to a bug regarding gcc as early as version 10 and znver1. I guess this is what you hit. Znver3 here, I guess that's why I didn't hit it.
Best Regards,
Georgi |
|
Back to top |
|
|
cfgauss l33t
Joined: 18 May 2005 Posts: 726 Location: USA
|
Posted: Sun Jul 07, 2024 9:58 pm Post subject: |
|
|
cfgauss wrote: | Does this suggest recompiling gcc or some other approach? |
The Gentoo Wiki describes how to create /etc/portage/env/compiler-clang to use with /etc/package/package.env to list packages to be compiled by Clang.
Is it "safe" to have Clang compile, say, only packages that fail with an ICE error and keep gcc for all the others?
If it is "safe", can or should an inclusion be an entire category in /etc/package/package.env, e.g. dev-qt/* compiler-clang? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2526
|
Posted: Mon Jul 08, 2024 4:29 am Post subject: |
|
|
Oh, last night I must have read only up to "Does this suggest recompiling gcc", I'm sorry
Yes, it seems it should be some other approach like the one you've taken.
Given that clang wiki suggests setting up gcc fallback, it should be safe to have clang compile some packages. As to whether it should be a category in your env file, I don't know. First you should test if portage supports it.
Best Regards,
Georgi |
|
Back to top |
|
|
|