Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] compiling kernel from minimal cd freezes new system
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64
View previous topic :: View next topic  
Author Message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Wed Jul 04, 2018 8:46 pm    Post subject: Reply with quote

jagdpanther wrote:
ali3nx: Thanks for all the UEFI build info.

Quote:
One other thing you really want to try with uefi boot is using UUID based disk mounts in fstab. with GPT partition labels using uuid disk mount ID's has become the modern standard.


Instead of UUID labels, because I was and will use gpt, what about PARTLABEL? That allows you to use the label names you assign in parted.


You can use partition label names in fstab but disk UUID's can never fail to find the correct partition if you ever add a new disk to your system. The partition names i more or less configure for my own reference rather than functional utility.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Thu Jul 05, 2018 4:49 pm    Post subject: Reply with quote

jagdpanther something you could try if you haven't already. re-seat your ram sticks and see if that changes the results.

Sometimes new ram slots are deceptive about ram sticks having been installed correctly.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Fri Jul 06, 2018 1:31 am    Post subject: Reply with quote

Quote:
re-seat your ram sticks and see if that changes the results


That did not change anything.

I am now using a UEFI boot and have the CSM turned off in Asus's BIOS. (I resinstalled from scratch using UEFI options mentioned in the Gentoo Handbook.) I also disabled FastBoot

First Boot:
I see loading kernel 4.17.4 .... on the monitor then nothing else on the screen. The keyboard is not frozen (caps lock LED works.) and the HDD activity LED is flikering. (I assume the unused but ext4 formated and mounted 6GB HDD is finishing the "quick" format.)

I can ping the system but can't ssh into it (I probably did not "AllowRoot" in /etc/ssh/sshd_config.)
I'll reboot using SystemRecue CD, change /etc/ssh/sshd_config and reboot again.

1) Why when I am using SystemRescue CD in with a UEFI boot (and this also happened during the install) do option's one and two, ("System Rescue CD default" and "System REscue CD Cache all files") not appear to work but the third option did: "System REscue CD disable kernel-Mode settings"? Is there something I need to do in my Kernel or grub settings to mimic this "disable kernel-mode settings?
EDIT: never mind. It appears I need to add "nomodeset" to the "linux ..." line in grub.cfg. Guess I'll write a custom /boot/grub/grub.cfg like I usually do ... It the past they are usually about 10 lines long per boot entry. MUCH shorter that the grub-mkconfig generated grub.cfg. But first I'll try "insmod video_fb" because there is no vga.mod in /boot/grub/x86_64-efi/.

2) Are there extra UEFI settings I need in the kernel to allow the monitor to work?
(I hope to get more clues after sshing into the new system.)


Last edited by jagdpanther on Sat Jul 07, 2018 9:19 pm; edited 1 time in total
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Fri Jul 06, 2018 3:34 am    Post subject: Reply with quote

One of the other major lessons one has to learn when approaching uefi setup is how the framebuffer kernel config must be entirely different when using uefi boot. legacy framebuffer kernel setup is largely obsolete with uefi and only a specific kernel config will permit a uefi framebuffer console to display correctly.

You should not need to add nomodeset to the kernel boot flags with uefi. keeping kernel mode setting enabled should be preferred IF your kernel config is setup correctly.

Describing the differences of every kernel option between a legacy non uefi system and a uefi framebuffer config could be challenging but perhaps providing the kernel config from my own 6700k intel with an nvidia 1060 that runs the nvidia binary drivers could be worthwhile perspective to investigate the differences on your own.

https://bpaste.net/show/b40aee50d021

Quote:
Guess I'll write a custom /boot/grub/grub.cfg like I usually do ... It the past they are usually about 10 lines long per boot entry. MUCH shorter that the grub-mkconfig generated grub.cfg.


I would also adivse against doing this using uefi boot because grub config is setup by grub-mkconfig with uefi specific directives every time the config is generated. if you need custom things configured in grub use the grub defaults file located at

Code:
/etc/default/grub

_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!


Last edited by ali3nx on Fri Jul 06, 2018 3:40 am; edited 1 time in total
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Fri Jul 06, 2018 3:37 am    Post subject: Reply with quote

I'll try to figure out the monitor issue tomorrow.

So I tried to "emerge gcc" and the system did not crash. (yea.)
But it got stuck at:

Code:
checking for ANSI C header files... (cached) yes
checking whether time.h and sys/time.h may both be included... yes
checking whether string.h and strings.h may both be included... yes
checking pthread.h usability...


After waiting for 30 min with one thread pegged at 100%, I looked in /var/log/messages and see:


Code:
Jul  5 20:26:56 runner-public kernel: INFO: rcu_sched self-detected stall on CPU
Jul  5 20:26:56 runner-public kernel: \x0912-....: (1 GPs behind) idle=50a/1/4611686018427387906 softirq=129501/129512 fqs=96004
Jul  5 20:26:56 runner-public kernel: \x09 (t=288015 jiffies g=33587 c=33586 q=256084)
Jul  5 20:26:56 runner-public kernel: NMI backtrace for cpu 12
Jul  5 20:26:56 runner-public kernel: CPU: 12 PID: 26191 Comm: cc1 Tainted: G      D           4.17.4-gentoo-01 #1
Jul  5 20:26:56 runner-public kernel: Hardware name: System manufacturer System Product Name/WS X299 SAGE, BIOS 0502 03/19/2018
Jul  5 20:26:56 runner-public kernel: Call Trace:
Jul  5 20:26:56 runner-public kernel:  <IRQ>
Jul  5 20:26:56 runner-public kernel:  dump_stack+0x46/0x5b
Jul  5 20:26:56 runner-public kernel:  nmi_cpu_backtrace+0xb3/0xc0
Jul  5 20:26:56 runner-public kernel:  ? irq_force_complete_move+0x70/0x70
Jul  5 20:26:56 runner-public kernel:  nmi_trigger_cpumask_backtrace+0x8f/0xc0
Jul  5 20:26:56 runner-public kernel:  rcu_dump_cpu_stacks+0x90/0xbe
Jul  5 20:26:56 runner-public kernel:  rcu_check_callbacks+0x5a6/0x7f0
Jul  5 20:26:56 runner-public kernel:  update_process_times+0x23/0x50
Jul  5 20:26:56 runner-public kernel:  tick_sched_timer+0x36/0x70
Jul  5 20:26:56 runner-public kernel:  ? tick_sched_handle.isra.6+0x40/0x40
Jul  5 20:26:56 runner-public kernel:  __hrtimer_run_queues+0xfa/0x1a0
Jul  5 20:26:56 runner-public kernel:  hrtimer_interrupt+0xe0/0x240
Jul  5 20:26:56 runner-public kernel:  smp_apic_timer_interrupt+0x54/0x90
Jul  5 20:26:56 runner-public kernel:  apic_timer_interrupt+0xf/0x20
Jul  5 20:26:56 runner-public kernel:  </IRQ>
Jul  5 20:26:56 runner-public kernel: RIP: 0010:queued_spin_lock_slowpath+0x10f/0x170
Jul  5 20:26:56 runner-public kernel: RSP: 0000:ffffadbd8e893d38 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Jul  5 20:26:56 runner-public kernel: RAX: 0000000000000101 RBX: ffff8bea407a7570 RCX: 0000000000000001
Jul  5 20:26:56 runner-public kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffda108101e9f0
Jul  5 20:26:56 runner-public kernel: RBP: ffffffffffffffff R08: 0000000000000101 R09: 00007f14a54ae000
Jul  5 20:26:56 runner-public kernel: R10: ffffda107cf81640 R11: 0000000000000001 R12: 0000000000000000
Jul  5 20:26:56 runner-public kernel: R13: ffffadbd8e893e40 R14: 00007f14a54ae000 R15: ffffadbd8e893e40
Jul  5 20:26:56 runner-public kernel:  ? _cond_resched+0x10/0x40
Jul  5 20:26:56 runner-public kernel:  unmap_page_range+0x376/0x8e0
Jul  5 20:26:56 runner-public kernel:  unmap_vmas+0x47/0xa0
Jul  5 20:26:56 runner-public kernel:  exit_mmap+0x87/0x170
Jul  5 20:26:56 runner-public kernel:  mmput+0x2b/0xd0
Jul  5 20:26:56 runner-public kernel:  do_exit+0x23c/0xa20
Jul  5 20:26:56 runner-public kernel:  rewind_stack_do_exit+0x17/0x20


???

Then I try sending "kill <pid>" and then kill -9 <pid>" but the gcc process that is stuck just won't die.
After the kills the following is sent to the process output:

Code:
checking whether string.h and strings.h may both be included... yes
checking pthread.h usability...

Exiting on signal 15
sandbox:stop  caught signal 15 in pid 7907
make[1]: *** [Makefile:22306: stage2-bubble] Terminated
Terminated
make[2]: *** [Makefile:20148: configure-stage2-target-libgomp] Terminated
make: *** [Makefile:22521: bootstrap-lean] Terminated
Terminated
sandbox:stop  Send signal 4 more times to force SIGKILL


Any ideas?
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Fri Jul 06, 2018 3:48 am    Post subject: Reply with quote

Which kernel config did you use? try using mine i linked above with minor changes you might require. If you created your own config Like i imagine many people do and I used to do it from scratch you may be missing some chipset drivers.

modern uefi based pc and server hardware platforms are most certainly not as simple to configure a kernel as they used to be. one chipset driver my 6700k requires is pinctl-sunrisepoint and i can assure you i could not locate that driver module in the kernel kconfig. it was buried under the seventh level of kernel kconfig hell and I could not locate it.

Only using the sysrescuecd kernel config as a template allowed me to configure a kernel that provided that driver module.

While that's only one example you may need to be more attentive to any differences between the kernel driver modules loaded in lsmod while booted from sysrescuecd and your own kconfig while booted without sysrescuecd.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Fri Jul 06, 2018 3:25 pm    Post subject: Reply with quote

You should consider buying a workstation mainboard.
Like from Supermicro.
Or Asus WS C422 PRO/SE Intel C422.

Maybe a different RAM.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Jul 06, 2018 4:57 pm    Post subject: Reply with quote

faulty ram may cause error and that error may cause a reboot, but generally, the error should just GPF and maybe a lock down.

really, if your system reboot, it's certainly more because of m/b protection (heat security) or voltage weirdness.
the problem is that compiling will push both (temperature and voltage draining).

you really should consider monitoring heat (and not only cpu heat, people always think about cpu heat, but m/b do also watch north/southbridge heat too) and assume psu fault if heat seems ok. Keep also in mind that running 8 DIMM will create more heat for your memory, and such configuration may need cooling the DIMM too.
your psu is not in your m/b QVL (while some PSU from that vendor are in), it doesn't mean the PSU is not good, but you cannot be certain that PSU model is ok, putting a doubt on it, and doubt is worst thing to have when you have such weirdness. If you can, sadly, pickup a PSU from its QVL list. -> http://dlcdnet.asus.com/pub/ASUS/mb/Socket2066/WS_X299_SAGE/QVL/WS_X299_SAGE_add-on_card_AVL_20180122.pdf
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Fri Jul 06, 2018 6:07 pm    Post subject: Reply with quote

Quote:
You should consider buying a workstation mainboard.
Like from Supermicro.
Or Asus WS C422 PRO/SE Intel C422.


I am using an Asus WS X299 Sage motherboard. (I wanted the WS X299 PRO. But they are not sold in North America. The WS X299 PRO/SE does not have QFan control, which I reall like: I can set the gradual or quick fan vs cpu temperature in BIOS. So fan speed settings work even in LiveCDs. I looked at Supermicro, and would have purchased a C9X299-PG300 but wasn't sure if the fan "CUSTOM" mode would work like Asus's QFan speed or if I could use a PCI-E slot other than the first one for a single PCI-E x16 video card.


Last edited by jagdpanther on Fri Jul 06, 2018 6:21 pm; edited 1 time in total
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Fri Jul 06, 2018 6:21 pm    Post subject: Reply with quote

Quote:
really, if your system reboot, it's certainly more because of m/b protection (heat security) or voltage weirdness.
the problem is that compiling will push both (temperature and voltage draining).


Thanks for the suggestion. I tried with a 2nd power supply and had the same lockup when compiling gcc. The power supply I bought for this single graphics's card system is a Sesonic Prime 850W.
(Note there was no lockup when using Ubuntu Live CD. I compiled gcc source using Ubuntu GCC's without issue. I think ali3nx is probably right with his suggestion that that my custom kernel may not be configured properly. I am planning to upgrade his sample configuration to 4.17.4-gentoo and try that.

I put an extra 40mm fan over the X299 chip heat sink. No difference. I also used "watch sensors" while running CPU burn from the "Ultimate Boot CD" and could not get the CPU temperature over 80C. ... and no crash. CPU idles at 35C and usually doesn't go over 65C. This system has a lot of air flow in a Fractal Design Define R6 case.

Last night I ran Memtest86 v 7.5 (not memtest86+) from a UEFI boot and in parallel mode (ie. use all threads) have 5 complete passes without error.


Last edited by jagdpanther on Fri Jul 06, 2018 6:32 pm; edited 1 time in total
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Fri Jul 06, 2018 6:26 pm    Post subject: Reply with quote

On that list I see Antec EA-500D 500W and EDG750 750W
The D is for Delta, a good power supply. I have one but they might not be for sale anymore. I bought it a long time ago.

Another of my machines has an EDG550 550W and it is fully modular and high efficiency. I like it very much. I will definitely buy another Antec EDG for my next PS update.
I've never had an Antec fail.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Fri Jul 06, 2018 6:42 pm    Post subject: Reply with quote

jagdpanther wrote:
Quote:
really, if your system reboot, it's certainly more because of m/b protection (heat security) or voltage weirdness.
the problem is that compiling will push both (temperature and voltage draining).


Thanks for the suggestion. I tried with a 2nd power supply and had the same lockup when compiling gcc. The power supply I bought for this single graphics's card system is a Sesonic Prime 850W.
(Note there was no lockup when using Ubuntu Live CD. I compiled gcc source using Ubuntu GCC's without issue. I think ali3nx is probably right with his suggestion that that my custom kernel may not be configured properly. I am planning to upgrade his sample configuration to 4.17.4-gentoo and try that.

I put an extra 40mm fan over the X299 chip heat sink. No difference. I also used "watch sensors" while running CPU burn from the "Ultimate Boot CD" and could not get the CPU temperature over 80C. ... and no crash. CPU idles at 35C and usually doesn't go over 65C. This system has a lot of air flow in a Fractal Design Define R6 case.

Last night I ran Memtest86 v 7.5 (not memtest86+) from a UEFI boot and in parallel mode (ie. use all threads) have 5 complete passes without error.


Something else you could also try is actually using ubuntu's kernel config from 18.04 and see if that works. There's always suggestions available and sometimes shooting in the dark based on experience with ~things~ you either hit the target close enough to gain some direction or you have no progress.

It's either software or hardware and eliminating the software potion here may be the direction with some potential for progress given hardware is new but doing that also requires ensuring your software config is configured adequately enough to not be an additional potential source of conflicts.

If the software problems don't reveal direction at some stage where you've covered all the potential major complications you very likely have a hardware problem.

Gentoo is amazing but may also not be the most productive avenue to reveal if you do really have a hardware problem given the potential for not configuring something correctly. Fortunately the Gentoo users are collectively some of the brightest and most talented Linux users assembled so getting some assistance or advice you find some of the best around.

I learned a lot from these dudes in fifteen years.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Fri Jul 06, 2018 10:24 pm    Post subject: Reply with quote

Quote:
It's either software or hardware and eliminating the software potion here may be the direction with some potential for progress given hardware is new but doing that also requires ensuring your software config is configured adequately enough to not be an additional potential source of conflicts.


I currently don't think my new system issues are a hardware fault. If anyone disagrees with my reasons, stated below, please let me know. I have almost a week to return the parts if needed from the vendor.
1. The full system freeze and then self-reboot occured when booting in BIOS mode (not UEFI) AND compiling something. This happened in the chroot Gentoo Minimal CDROM environment (both compiling the kernel and emerging gcc) , The SystemRescueCD chroot environment (but not compiling the kernel just emerging gcc) and my install with my kernel configuration.
2. Turning off the CSM compatability and fast boot in the ASUS BIOS and only using UEFI boots allowed a 2nd build from the SystemRescue CD chroot environment. After the build, using my kernel configuration, my system worked but when emerging gcc, portage never finished and just ran one thread at 100%. Kill -9 did not stop it. (This was all done via ssh because I haven't spent any time trying to get video to work yet ... no system console either.)
3. Ubuntu LiveCD worked and the Ubuntu provided gcc compiled the source gcc without issue.
4. On a seperate SATA drive (not the NVMe M2 that I have Gentoo on), today, I installed Win10 in UEFI mode, the Nvidia driver, Steam and played a little "Rise of the Tomb Raider" and saw no issues.

I suspect my kernel configuration. (Which I borrowed from my 4-year old working Gentoo system, and just added the NVMe setting and the proper modules for the network.)

Any suggestions for a starting point for my kernel configuration?
I am thinking of one of these:
1. Ali3nx's 4.16.18-gentoo configuration
2. Ubuntu's 18.04 configuration
3. The archlinux kernel configuration that provided:
https://openbenchmarking.org/s/ASUS%20WS%20X299%20SAGE

I appreciate all of the input I have been receiving on this issue.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Fri Jul 06, 2018 11:33 pm    Post subject: Reply with quote

jagdpanther wrote:
I suspect my kernel configuration. (Which I borrowed from my 4-year old working Gentoo system, and just added the NVMe setting and the proper modules for the network.)

Any suggestions for a starting point for my kernel configuration?
I am thinking of one of these:
1. Ali3nx's 4.16.18-gentoo configuration
2. Ubuntu's 18.04 configuration
3. The archlinux kernel configuration that provided:
https://openbenchmarking.org/s/ASUS%20WS%20X299%20SAGE

I appreciate all of the input I have been receiving on this issue.


Given the progress and results from testing a windows install which while it is windows is still a valid software stability test. If you only added nvme driver support to a 4 year old kernel config there have certainly been several necessary kernel features missing.

With that mentioned try all three kernel configs. Mine, Ubuntu and Arch. Between the three configs you will get some result that offers direction. It's a little extra effort but will be worth it for diagnostic merit.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Jul 06, 2018 11:45 pm    Post subject: Reply with quote

jagdpanther wrote:
I currently don't think my new system issues are a hardware fault. If anyone disagrees with my reasons, stated below, please let me know. I have almost a week to return the parts if needed from the vendor.

Sorry, i do
Any software error will endup log somewhere, only hardware error could disallow a software logging an error (that's a too high error that prevent the system itself to even logging the error) ; that's why mce were made: because hardware can log itself the error internally and report it to a software.
It's a tiny raw logic so easy to get: if someone cut electricity, your hardware could have a "if i get short of electricty, i'll write and keep that i was short of electricity to later report that" ; but how could a sofware could ask anything to a cpu that is out of electricity to work (the work to do should be "write something in a log")?
your unability to have software logging any error is not helping toward thinking about a software only issue. Still because nothing is impossible, it might be a software issue, but i tend to stay within "acceptable" logic, and extremly hard to catch error by software seems far less possible than a brand new broken hardware (that's quiet common, hardware tend to be more and more reliable with time and tend to break earlier if any defect was made in the process), we have in french word for that "rodage" which is translate (bad?) into "running-in period"

Quote:
2. Turning off the CSM co...

your UEFI not UEFI is also a broken test.
You seems to accept "oh if it work in UEFI than it's because it's a new machine that love UEFI".
Nah, if really a m/b could work only in UEFI, than the m/b would only have UEFI ; if the m/b allow non UEFI it's because this is suppose to work.
So your UEFI test may be of help to track the issue if you want, but still, you cannot assume UEFI should be stable and non UEFI broken and that is perfectly ok.
I do have an UEFI m/b myself, i cannot even tell you if UEFI booting is working, because i never use it, but i assume it is, anyway, my system have no problem running in non UEFI (and this shouldn't be a surprise, just normality)

Quote:
4. On a seperate SATA drive (not the NVMe M2 that I have Gentoo on), today, I installed Win10 in UEFI mode, the Nvidia driver, Steam and played a little "Rise of the Tomb Raider" and saw no issues.
Many, but many (early) ryzen users could tell you about this, amd has even recommand to get a "fixed!" cpu only for linux users, while their Windows users (not seeing the problem) were gently leave with a broken cpu (the second market for ryzen will be a nightmare!)
So, as ryzen prove, not working in Windows and linux prove it's not working, but working in Windows doesn't prove it is working :D
Why, just because ryzen users in linux were using gcc (which is not kidding with cpu) and running Windows is a peace of cake for such cpu.
I'm not really impress by your Tombe Raider test too, that should push your videocard to its limit maybe, but certainly not a beast with 28 cores.

Your ubuntu case (#3) seems more interresting, however ubuntu is a binary distro, choices made in their kernel could be the difference between a gcc working fully and a gcc slowdown by a feature (like kernel scheduler or whatever) that may not expose the problem (while still the problem exists, just not exhibit in that environnment). I even think some ryzen users weren't affect even running gcc test in some linux distro (but i'm not certain).

While i agree a kernel may expose strange issue, when you start testing different kernels (not made by you so) and keep getting the issue, i'm incline to think the issue may not be the kernel.

This is what i would do if i was you (i'm not saying that's the best to do, but what I would do)
1/ considering the vendor policy: if they accept return without proof or anything (that's important, don't return something the vendor won't see himself, a "my cpu doesn't work" will get your ass kick if "your cpu work with us" (because they use Windows certainly).
2/ if vendor need proof, then argue about a prize for the service to not giving the proof, but having the vendor himself find the cause. So instead of proving yourself the material is faulty, tell the vendor the material is 100% sure faulty and you need him to point where the fault is, and you will pay for the service.
Vendors always like to have their time paid, and when they find the cause, they will be the first to offer an exchange of the faulty material to make their customer happy.
3/ if vendor trust you and doesn't ask proof, send back what is not QVL approve from your list, and ask an exchange or upgrade to QVL materials your m/b have in its list ; like i said earlier, non QVL is not a proof the material may not work with it, but QVL is a proof it should, removing any doubt about incompatibilty.
4/ because 8 DIMM is only support by 6 cores cpu on that m/b (the m/b support 4 DIMM for 4xcore cpu, and only 6+ core cpu allow 8 DIMM), this kind of oddity always make me feel better to avoid "corner case": this to say, if i were you, i would reach my 64Gb of ram with 4xDIMM instead of the 8x8 combo you have goes for, and i will do my best to pickup ones from the QVL of m/b, so looking hard to get QVL 4x16 DIMM).
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Sat Jul 07, 2018 1:56 am    Post subject: Reply with quote

krinn:

Thanks for the detailed reply.

Quote:
I do have an UEFI m/b myself, i cannot even tell you if UEFI booting is working, because i never use it, but i assume it is, anyway, my system have no problem running in non UEFI (and this shouldn't be a surprise, just normality)


Untill this system build I have never used UEFI either. (Yes, I have partially bought the idea that new hardware has been tested more with UEFI than legacy BIOS, which is why I am trying UEFI now.)

Quote:
Tombe Raider test too, that should push your videocard to its limit maybe, but certainly not a beast with 28 cores.

A few days ago (not yet using UEFI) I booted up "The Ultimate Boot CD" and found one of the CPU Burn tests that pushed my CPU up to 80C with all cores in use (looking at "watch sensors"). This did not cause any issue.

Quote:
While i agree a kernel may expose strange issue, when you start testing different kernels (not made by you so) and keep getting the issue, i'm incline to think the issue may not be the kernel.

The kernels I am having trouble with are Gentoo related: Gentoo minimal install chroot non-uefi: system freeze on kernel build and emerge gcc. System Rescue CD (based on Gentoo, I think): kernel builds but emerge gcc (in chroot) freezes system in BIOS. (I did not try to emerge gcc in UEFI System Rescue CD chroot.) My kernels: BIOS: emerge gcc: system freeze. UEFI: emerge gcc: get stuck and stuck process ignores 'kill -9 <pid>'. I guess I should try some other kernels.

I am using 4x16GB DIMMs and they are on the QVL.

I have the option of returning the components for any reason, including "other". I am thinking of sending back the motherboard, cpu and memory and trying again. Perhaps with a Supermicro C9X299-PG300 instead of the Asus WS X299 Sage. Is there any reason to return the other electornics: 1 Seagate HHD, one Crucial Sata SSD and one Samsung PCIe SSD?
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Sat Jul 07, 2018 2:24 am    Post subject: Reply with quote

Any ideas on getting the console up and running? (Right now I ssh into the new system.)
This would help with testing.

During a boot, the grub menu shows up, I hit return, then I see
"Loading Linux 4.17.4-gentoo ..."
And that stays on the monitor. (Caps led works but nothing I type on the keyboard does anything to the screen)
That is using the default /etc/default/grub and 'grub-mkconfig > /boot/grub/grub.cfg.
I have tried a few hand written grub.cfg (like I do on my old BIOS booted Gentoo system) and usually just get a blank screen.
I also tried using the directions at the bottom of https://wiki.gentoo.org/wiki/GRUB2 to try to use the frame buffer to no avail.

In grub I can hit 'c' then type:
insmod all_video
videoinfo
and see about 12 different modes.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Sat Jul 07, 2018 4:16 am    Post subject: Reply with quote

Quote:
4. On a seperate SATA drive (not the NVMe M2 that I have Gentoo on), today, I installed Win10 in UEFI mode, the Nvidia driver, Steam and played a little "Rise of the Tomb Raider" and saw no issues.


Can you test with 3DMark on Win 10?
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Sat Jul 07, 2018 9:50 pm    Post subject: Reply with quote

Quote:
4. On a seperate SATA drive (not the NVMe M2 that I have Gentoo on), today, I installed Win10 in UEFI mode, the Nvidia driver, Steam and played a little "Rise of the Tomb Raider" and saw no issues.

Can you test with 3DMark on Win 10?


I downloaded the free version and for Time Spy it reported about 7010. There were no issues with the run.
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Sat Jul 07, 2018 10:33 pm    Post subject: Reply with quote

jagdpanther wrote:
Any ideas on getting the console up and running? (Right now I ssh into the new system.)
This would help with testing.

During a boot, the grub menu shows up, I hit return, then I see
"Loading Linux 4.17.4-gentoo ..."
And that stays on the monitor. (Caps led works but nothing I type on the keyboard does anything to the screen)
That is using the default /etc/default/grub and 'grub-mkconfig > /boot/grub/grub.cfg.
I have tried a few hand written grub.cfg (like I do on my old BIOS booted Gentoo system) and usually just get a blank screen.
I also tried using the directions at the bottom of https://wiki.gentoo.org/wiki/GRUB2 to try to use the frame buffer to no avail.

In grub I can hit 'c' then type:
insmod all_video
videoinfo
and see about 12 different modes.


efifb framebuffer kernel config needs to be configured correctly and yours is not if your not seeing the framebuffer console login prompt, boot messages and still using your own four year old kernel config. check my kernel config for reference on what to configure to get it working properly.

The solution to the problem should not require force setting resolution modes using grub. that's no longer necessary or relevant with uefi
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Sat Jul 07, 2018 11:36 pm    Post subject: Reply with quote

Good news and bad ....

Using ali3nx's configuration https://bpaste.net/show/b40aee50d021, and "make oldconfig" to update it to 4.17.4-gentoo, the console is working. (Yea) The difference between my configuration in ali3nx's is huge. I really wish I new what kernel configuration settings I need for a working console when using a UEFI boot. The bad news is that using that same kernel, within 15 seconds of starting "emerge gcc" the system freezes and self-reboots in another 20 seconds. Because the console is working I see a message similar to:
https://www.dropbox.com/sh/go58lo8z5s48d57/AACJ3lrHNmGEQrYlVxAjVY0ia
This is the first time since I switched over to a UEFI boot from a BIOS boot that I froze my system.

Using a configuration I generated years ago (plus lots of make oldconfig) on a (BIOS boot) two month old Dell workstation (Xeon instead of I9) at work (running Gentoo) the console still fails (I used grub-mkconfig > /boot/grub/grub.cfg ... On the console I see " Loading Linux 4.17.4-gentoo ..." and that is all.) However, good news, I can emerge gcc without any error. I did this twice. On that Dell system, the console works. Probably a UEFI boot vs BIOS boot kernel configuration issue.

Considering the failures and sytem freezes I have had using kernels configured by me, ali3nx, the Gentoo minimal install and system rescue CD. I am probably going to take krinn's advice and send back my cpu, motherboard and memory. Then I'll start again (same model cpu, i9-7940X and memory, Corsair 4 x 16GB, but probably a SuperMicro motherboard.)
Back to top
View user's profile Send private message
ali3nx
l33t
l33t


Joined: 21 Sep 2003
Posts: 732
Location: Winnipeg, Canada

PostPosted: Sun Jul 08, 2018 12:21 am    Post subject: Reply with quote

jagdpanther wrote:
Good news and bad ....

Using ali3nx's configuration https://bpaste.net/show/b40aee50d021, and "make oldconfig" to update it to 4.17.4-gentoo, the console is working. (Yea) The difference between my configuration in ali3nx's is huge. I really wish I new what kernel configuration settings I need for a working console when using a UEFI boot. The bad news is that using that same kernel, within 15 seconds of starting "emerge gcc" the system freezes and self-reboots in another 20 seconds. Because the console is working I see a message similar to:
https://www.dropbox.com/sh/go58lo8z5s48d57/AACJ3lrHNmGEQrYlVxAjVY0ia
This is the first time since I switched over to a UEFI boot from a BIOS boot that I froze my system.

Using a configuration I generated years ago (plus lots of make oldconfig) on a (BIOS boot) two month old Dell workstation (Xeon instead of I9) at work (running Gentoo) the console still fails (I used grub-mkconfig > /boot/grub/grub.cfg ... On the console I see " Loading Linux 4.17.4-gentoo ..." and that is all.) However, good news, I can emerge gcc without any error. I did this twice. On that Dell system, the console works. Probably a UEFI boot vs BIOS boot kernel configuration issue.

Considering the failures and sytem freezes I have had using kernels configured by me, ali3nx, the Gentoo minimal install and system rescue CD. I am probably going to take krinn's advice and send back my cpu, motherboard and memory. Then I'll start again (same model cpu, i9-7940X and memory, Corsair 4 x 16GB, but probably a SuperMicro motherboard.)


You had to try to be convinced and we tried

While i'm not 100% convinced something is not working properly while running Linux. I'd be 100% convinced if i did your entire uefi boot install myself and it still failed to compile gcc. Should you be interested the offer stands. It's been a few years since someone volunteered to have me flex their hardware with a Gentoo Install :wink:

I succeeded at getting amdgpu opencl compute working on gentoo today so i may be having a good weekend.
_________________
Compiling Gentoo since version 1.4
Thousands of Gentoo Installs Completed
Emerged on every continent but Antarctica
Compile long and Prosper!
Back to top
View user's profile Send private message
jagdpanther
l33t
l33t


Joined: 22 Nov 2003
Posts: 760

PostPosted: Sun Jul 15, 2018 1:22 am    Post subject: Reply with quote

Thanks again for all the suggestions and comments.

The original parts were purchased less than 30 days ago so I returned the memory, cpu and motherboard for a full refund and I bought an identical cpu, memory kit and different motherboard. (It has been years since I used a supermicro board ...)

The new system is working well so far. (first emerge after reboot: gcc: no issues.) I am still in the post-install phase.
(currently watching 367 packages emerge from a 'emerge --emptytree @world' using my CFLAGS and useflags.)
Back to top
View user's profile Send private message
johngalt
Apprentice
Apprentice


Joined: 09 Sep 2004
Posts: 259
Location: 3rd Rock

PostPosted: Sun Jul 15, 2018 5:07 am    Post subject: Reply with quote

@jagdpanther - glad you got it all worked out. If I had seen this earlier I would have posted my own two cents.

I have a much older system - Gen 1 Core i7 965 EE on an eVGA X58 (Intel) based mobo with 12 GB DDR3 (triple channel, 3 * 4 GB DIMMs).

I have had 0 hardware issues running Windows on this machine (Windows 7 --> Windows 10) for the last 6 years. It's also a homebrew, as all my desktops are, and I build with future proofing in mind, hence why I'm using a decade old architecture successfully.

When I decide to make the move back to Gentoo as my daily OS, I learned something really, really quickly.

The eVGA BIOS (no (u)EFI at all) has extended settings in terms of overclocking for this chipset. Not only am I able to dig down deep into the settings of both the CPU and RAM (adjusting QPI, voltages, CAS and other timings, using Vdroop or not, etc.) but also more general settings, like 1) XMP versus standard profiles, 2) Turbo mode (for a single core) and 3) Dummy OC (which gives very safe OC of ~ 12.5%, from 3.2 GHz across all cores to 3.6 GHz across all cores, and this DummyOC works well with both standard and XMP RAM profiles.

I had all three enabled, and for years I had 0 issues in Windows. Everything just worked.

Then I decide to move to Gentoo. Since the mobo is BIOS based, I tried booting the Gentoo minimal disc. No go. I then tried Live DVD. No go. I then tried CloneZilla, Ubuntu, GPartEd, even SystemRescueCD - none of them would boot.

I played with all my settings, over the course of a few days, in the BIOS - I probably reset my BIOS over 50 times, even replaced the CMOS batter, and reflashed the latest BIOS 3 different times. If I had any of those three settings enabled in BIOS, I could not get a Linux kernel to boot. I even tried USB versus CD-R/DVD-R for these - as long as I had all three of them, or any combination of them including just XMP profile - the linux kernel would not boot.

As soon as I disabled all three of those and left everything else exactly as I had it set when I was running Windows (and I mean everything, VTx, AHCI (no RAID), FireWire enabled, PXE disabled, my SSD set as default boot device, the whole nine yards) - every one of those distros started booting perfectly fine.

Now, I realize this is completely different from your issue, in that you were actually compiling, and / or emerging gcc.

But, as I was finalizing the installs across 3 different machines, including that BIOS desktop and 2 (u)EFI-based laptops, I was looking everywhere for answers. And I mean everywhere.

And I found a great resource in the 15+ yr old FAQ - only the link was dead. Fortunately, I just PMd pjp about it a couple of days ago, and he was kind enough to link me to the web archive of the post.

https://web.archive.org/web/20050629085215/http://www-106.ibm.com/developerworks/library/l-hw1/

The article was written by one Daniel Robbins. And he points out that a really, really good CPU test is to run repeated (and parallel) kernel compiles to determine if your hardware is bad.

Since you've now returned the hardware, it's pretty moot - but it's interesting to note that this article pretty much aligned exactly with what krinn (and others I think) were saying in this thread - and the coincidence that I see your last reply (and thus the thread) 1 day after asking about the broken link to Daniel's post covering exactly what you found out yourself....
_________________
desultory wrote:
If you want to retain credibility as a functional adult; when you are told that you are acting boorishly, the correct response is to consider that possibility and act accordingly to correct that behavior.


Amen.
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2289
Location: near Augsburg, Germany

PostPosted: Sun Jul 15, 2018 5:14 pm    Post subject: Reply with quote

What mainbaord and memory kit did you buy?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on AMD64 All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum