View previous topic :: View next topic |
Author |
Message |
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3857 Location: Rasi, Finland
|
Posted: Sat Jan 11, 2025 2:24 pm Post subject: Workstation: To dual socket or not on desktop Gentoo? |
|
|
So.
I've been thinking to perform a major upgrade to my actual Desktop PC which has been sitting unused for a while now (AMD FX-<something> 32GB RAM).
My target is 256GB of RAM, but do I go with single socket or dual? With dual I could get two lower core count CPUs but with higher frequency. On the other hand I could just go with some 20 core single CPU and avoid every disadvantage which NUMA brings.
I would say that I do not game, but I do... a little. My gaming mostly consists of emulation of games from the beginning to very early 2000's. Then I occasionally buy some games from gog.com.
I will surely run some VMs with the capacity the workstation will bring. And certainly will build packages for my other boxes. I'd guess portage has no problems scaling up to say 60-80 threads.
Questions:- Any possible pitfalls of using dual socket system (with 128GB RAM per CPU) Gentoo wise as a desktop PC?
- If some program (game) gets slow because of this, can I easily force the game to run on the CPU which has the GPU connected to (and use RAM closest to it)?
_________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1373 Location: Richmond Hill, Canada
|
Posted: Sat Jan 11, 2025 2:41 pm Post subject: |
|
|
What is "disadvantage which NUMA brings"? I am curious.
I would imagine that if one of the target use case is VM than having ability to partition would be nice. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3857 Location: Rasi, Finland
|
Posted: Sat Jan 11, 2025 2:57 pm Post subject: |
|
|
pingtoo wrote: | What is "disadvantage which NUMA brings"? I am curious. |
- Memory access latency
- CPU 0 needs to read memory, that's located on CPU 1's RAM
- Same latency issues with PCIe devices
... but are those significant at all to consider single socket?
pingtoo wrote: | than having ability to partition would be nice | Not sure what you meant by that. By 'partitioning' did you meant running a VM only on one of the CPUs? _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1373 Location: Richmond Hill, Canada
|
Posted: Sat Jan 11, 2025 3:22 pm Post subject: |
|
|
Zucca wrote: | pingtoo wrote: | What is "disadvantage which NUMA brings"? I am curious. |
- Memory access latency
- CPU 0 needs to read memory, that's located on CPU 1's RAM
- Same latency issues with PCIe devices
... but are those significant at all to consider single socket?
isn't the function of kernel (NUMA related) configuration is to address this? process affiliation concept.
pingtoo wrote: | than having ability to partition would be nice | Not sure what you meant by that. By 'partitioning' did you meant running a VM only on one of the CPUs? | I think QEMU also have concept of NUMA (I don't use them so I am not 100% sure but I remember saw configuration options talk about NUMA) I am imagining that VM (partitioned of portion of hardware) would be benefit from allocate resource from known boundary.
Forgive me if PC hardware does not work this way. I was admin for Oracle's T4/5/7 series machines in the past. And in it there is way to setup PCI/Memory/CPU affiliation so when create Zone it will bound to hardware specific boundary.
I think more core (CPU) mean more process bandwidth. (as in more can be concurrently processed) and higher single core (CPU) clock mean process one thing faster so what do you envision your target work set? Do you intent to do more concurrently or you want to do one thing at a time but finish each sooner. |
|
Back to top |
|
|
zen_desu Tux's lil' helper
Joined: 25 Oct 2024 Posts: 99
|
Posted: Sat Jan 11, 2025 5:42 pm Post subject: Re: Workstation: To dual socket or not on desktop Gentoo? |
|
|
Zucca wrote: | So.
I've been thinking to perform a major upgrade to my actual Desktop PC which has been sitting unused for a while now (AMD FX-<something> 32GB RAM).
My target is 256GB of RAM, but do I go with single socket or dual? With dual I could get two lower core count CPUs but with higher frequency. On the other hand I could just go with some 20 core single CPU and avoid every disadvantage which NUMA brings.
I would say that I do not game, but I do... a little. My gaming mostly consists of emulation of games from the beginning to very early 2000's. Then I occasionally buy some games from gog.com.
I will surely run some VMs with the capacity the workstation will bring. And certainly will build packages for my other boxes. I'd guess portage has no problems scaling up to say 60-80 threads.
Questions:- Any possible pitfalls of using dual socket system (with 128GB RAM per CPU) Gentoo wise as a desktop PC?
- If some program (game) gets slow because of this, can I easily force the game to run on the CPU which has the GPU connected to (and use RAM closest to it)?
|
In my experience, portage doesn't scale well to >32 threads.
I have a 7950x with 64gb of ram and a 7r13 with 512gb of ram.
Code: | 2024-12-27T21:11:34 >>> sys-devel/gcc: 30′55″
2024-12-27T22:14:08 >>> sys-kernel/gentoo-kernel: 9′22″ |
Code: | 2024-12-28T12:15:47 >>> sys-devel/gcc: 28′05″
2024-12-28T12:43:52 >>> sys-kernel/gentoo-kernel: 9′21″ |
I'll let you guess which is which. Both have more or less the same build settings and built with pgo and lto.
Both use similar amounts of power but the 7950x kills the 7r13 in single thread perf. If you're able to find builds which can use >32 threads for substantial periods of time, the build times just about average out. I tell portage to use 32 jobs but it rarely uses more than 4, and gets stuck waiting to merge stuff a lot of the time.
Rebooting my server also takes much, much longer and my desktop has much more usable I/O, etc etc. I seriously considered using my server as a workstation or something, but right now it mostly functions as a router/vm/container host which i mostly use over ssh. _________________ µgRD dev
Wiki writer |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54685 Location: 56N 3W
|
Posted: Sat Jan 11, 2025 6:15 pm Post subject: |
|
|
Zucca,
A few years ago I got to play with a 96 core Cavium Thunder X2, with 128G RAM.
make rarely gets 30 threads is flight concurrently.
The only way to get more is to build more packages concurrently.
Thats OK until you get three big packages building at the same time and you run out of real RAM.
With dual CPUs, you have the NUMA disadvantages you mention but you also double the memory bandwidth when NUMA is not holding you back.
I'm not sure about the NUMA disadvantages any longer either.
The Raspberry Pi5 has just become an 8 node NUMA system to improve performance. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3857 Location: Rasi, Finland
|
Posted: Sat Jan 11, 2025 6:21 pm Post subject: |
|
|
@zen_desu, interesting. I've been able to easily saturate 8 threads with those two packages.
How many NUMA nodes that EPYC has? _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
zen_desu Tux's lil' helper
Joined: 25 Oct 2024 Posts: 99
|
Posted: Sat Jan 11, 2025 6:25 pm Post subject: |
|
|
Code: | desu@amazon ~ $ lscpu | grep -i numa
NUMA node(s): 1
NUMA node0 CPU(s): 0-95 |
No mention of NUMA in dmesg.
When i build those packages, both systems will use all threads in bursts, gcc having several pauses between then. The kernel seems to mostly build in one run but takes a bit of time before/after compilation to do other actions. I mostly find it interesting that across a variety of packages, the build times end up being very, very similar. I think it's a good example of why more threads or memory isn't always the answer. I also do most builds on my server in a tmpfs and sometimes even run entire vms/containers on tmpfs's, this doesn't really improve speed all that much, but you'd think with all of these advantages it would be running laps around a "desktop cpu" _________________ µgRD dev
Wiki writer |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3857 Location: Rasi, Finland
|
Posted: Sat Jan 11, 2025 6:56 pm Post subject: |
|
|
NeddySeagoon wrote: | make rarely gets 30 threads is flight concurrently. |
Ok. It seems that there's some limit... at least now.
That in mind, I could probably run whatever compilation without seeing any noticeable degrade in performance while doing other normal daily tasks.
@zen_desu: It seems you have then an EPYC with only one NUMA node. If I recall correctly, it's not uncommon for EPYC to have at least two nodes internally.
Anyway. Thanks guys for your input. I think I'll go with dual socket system, with fewer cores, but with higher frequencies. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1373 Location: Richmond Hill, Canada
|
Posted: Sat Jan 11, 2025 8:41 pm Post subject: |
|
|
I am curious on modern PC how the linux kernel learn hardware configuration. In this case NUMA (Non-Uniform Memory Access) how did kernel know latency from each memory bank? does kernel perform testing during boot? Or does it depend on something external tell it ACPI/DT (device tree)?
In the conversation about building Gentoo system (emerge) assume on a NUMA machine will it be faster if you run multiple emerges at same time speed the entire process? assume you can partition your build set and utilize NUMA control to separate each emerge session? (Thinking VM/Container) |
|
Back to top |
|
|
|