Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Workstation: To dual socket or not on desktop Gentoo?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3857
Location: Rasi, Finland

PostPosted: Sat Jan 11, 2025 2:24 pm    Post subject: Workstation: To dual socket or not on desktop Gentoo? Reply with quote

So.
I've been thinking to perform a major upgrade to my actual Desktop PC which has been sitting unused for a while now (AMD FX-<something> 32GB RAM).

My target is 256GB of RAM, but do I go with single socket or dual? With dual I could get two lower core count CPUs but with higher frequency. On the other hand I could just go with some 20 core single CPU and avoid every disadvantage which NUMA brings.

I would say that I do not game, but I do... a little. My gaming mostly consists of emulation of games from the beginning to very early 2000's. Then I occasionally buy some games from gog.com.

I will surely run some VMs with the capacity the workstation will bring. And certainly will build packages for my other boxes. I'd guess portage has no problems scaling up to say 60-80 threads.

Questions:
  • Any possible pitfalls of using dual socket system (with 128GB RAM per CPU) Gentoo wise as a desktop PC?
  • If some program (game) gets slow because of this, can I easily force the game to run on the CPU which has the GPU connected to (and use RAM closest to it)?

_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1373
Location: Richmond Hill, Canada

PostPosted: Sat Jan 11, 2025 2:41 pm    Post subject: Reply with quote

What is "disadvantage which NUMA brings"? I am curious.

I would imagine that if one of the target use case is VM than having ability to partition would be nice.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3857
Location: Rasi, Finland

PostPosted: Sat Jan 11, 2025 2:57 pm    Post subject: Reply with quote

pingtoo wrote:
What is "disadvantage which NUMA brings"? I am curious.
  • Memory access latency
    • CPU 0 needs to read memory, that's located on CPU 1's RAM
  • Same latency issues with PCIe devices
... but are those significant at all to consider single socket?

pingtoo wrote:
than having ability to partition would be nice
Not sure what you meant by that. By 'partitioning' did you meant running a VM only on one of the CPUs?
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1373
Location: Richmond Hill, Canada

PostPosted: Sat Jan 11, 2025 3:22 pm    Post subject: Reply with quote

Zucca wrote:
pingtoo wrote:
What is "disadvantage which NUMA brings"? I am curious.
  • Memory access latency
    • CPU 0 needs to read memory, that's located on CPU 1's RAM
  • Same latency issues with PCIe devices
... but are those significant at all to consider single socket?

isn't the function of kernel (NUMA related) configuration is to address this? process affiliation concept.

pingtoo wrote:
than having ability to partition would be nice
Not sure what you meant by that. By 'partitioning' did you meant running a VM only on one of the CPUs?
I think QEMU also have concept of NUMA (I don't use them so I am not 100% sure but I remember saw configuration options talk about NUMA) I am imagining that VM (partitioned of portion of hardware) would be benefit from allocate resource from known boundary.

Forgive me if PC hardware does not work this way. I was admin for Oracle's T4/5/7 series machines in the past. And in it there is way to setup PCI/Memory/CPU affiliation so when create Zone it will bound to hardware specific boundary.

I think more core (CPU) mean more process bandwidth. (as in more can be concurrently processed) and higher single core (CPU) clock mean process one thing faster so what do you envision your target work set? Do you intent to do more concurrently or you want to do one thing at a time but finish each sooner.
Back to top
View user's profile Send private message
zen_desu
Tux's lil' helper
Tux's lil' helper


Joined: 25 Oct 2024
Posts: 99

PostPosted: Sat Jan 11, 2025 5:42 pm    Post subject: Re: Workstation: To dual socket or not on desktop Gentoo? Reply with quote

Zucca wrote:
So.
I've been thinking to perform a major upgrade to my actual Desktop PC which has been sitting unused for a while now (AMD FX-<something> 32GB RAM).

My target is 256GB of RAM, but do I go with single socket or dual? With dual I could get two lower core count CPUs but with higher frequency. On the other hand I could just go with some 20 core single CPU and avoid every disadvantage which NUMA brings.

I would say that I do not game, but I do... a little. My gaming mostly consists of emulation of games from the beginning to very early 2000's. Then I occasionally buy some games from gog.com.

I will surely run some VMs with the capacity the workstation will bring. And certainly will build packages for my other boxes. I'd guess portage has no problems scaling up to say 60-80 threads.

Questions:
  • Any possible pitfalls of using dual socket system (with 128GB RAM per CPU) Gentoo wise as a desktop PC?
  • If some program (game) gets slow because of this, can I easily force the game to run on the CPU which has the GPU connected to (and use RAM closest to it)?


In my experience, portage doesn't scale well to >32 threads.

I have a 7950x with 64gb of ram and a 7r13 with 512gb of ram.

Code:
2024-12-27T21:11:34 >>> sys-devel/gcc: 30′55″
2024-12-27T22:14:08 >>> sys-kernel/gentoo-kernel: 9′22″


Code:
2024-12-28T12:15:47 >>> sys-devel/gcc: 28′05″
2024-12-28T12:43:52 >>> sys-kernel/gentoo-kernel: 9′21″


I'll let you guess which is which. Both have more or less the same build settings and built with pgo and lto.

Both use similar amounts of power but the 7950x kills the 7r13 in single thread perf. If you're able to find builds which can use >32 threads for substantial periods of time, the build times just about average out. I tell portage to use 32 jobs but it rarely uses more than 4, and gets stuck waiting to merge stuff a lot of the time.

Rebooting my server also takes much, much longer and my desktop has much more usable I/O, etc etc. I seriously considered using my server as a workstation or something, but right now it mostly functions as a router/vm/container host which i mostly use over ssh.
_________________
µgRD dev
Wiki writer
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54685
Location: 56N 3W

PostPosted: Sat Jan 11, 2025 6:15 pm    Post subject: Reply with quote

Zucca,

A few years ago I got to play with a 96 core Cavium Thunder X2, with 128G RAM.
make rarely gets 30 threads is flight concurrently.
The only way to get more is to build more packages concurrently.
Thats OK until you get three big packages building at the same time and you run out of real RAM.

With dual CPUs, you have the NUMA disadvantages you mention but you also double the memory bandwidth when NUMA is not holding you back.
I'm not sure about the NUMA disadvantages any longer either.
The Raspberry Pi5 has just become an 8 node NUMA system to improve performance.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3857
Location: Rasi, Finland

PostPosted: Sat Jan 11, 2025 6:21 pm    Post subject: Reply with quote

@zen_desu, interesting. I've been able to easily saturate 8 threads with those two packages.
How many NUMA nodes that EPYC has?
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
zen_desu
Tux's lil' helper
Tux's lil' helper


Joined: 25 Oct 2024
Posts: 99

PostPosted: Sat Jan 11, 2025 6:25 pm    Post subject: Reply with quote

Code:
desu@amazon ~ $ lscpu | grep -i numa
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-95


No mention of NUMA in dmesg.

When i build those packages, both systems will use all threads in bursts, gcc having several pauses between then. The kernel seems to mostly build in one run but takes a bit of time before/after compilation to do other actions. I mostly find it interesting that across a variety of packages, the build times end up being very, very similar. I think it's a good example of why more threads or memory isn't always the answer. I also do most builds on my server in a tmpfs and sometimes even run entire vms/containers on tmpfs's, this doesn't really improve speed all that much, but you'd think with all of these advantages it would be running laps around a "desktop cpu"
_________________
µgRD dev
Wiki writer
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3857
Location: Rasi, Finland

PostPosted: Sat Jan 11, 2025 6:56 pm    Post subject: Reply with quote

NeddySeagoon wrote:
make rarely gets 30 threads is flight concurrently.


Ok. It seems that there's some limit... at least now.
That in mind, I could probably run whatever compilation without seeing any noticeable degrade in performance while doing other normal daily tasks.

@zen_desu: It seems you have then an EPYC with only one NUMA node. If I recall correctly, it's not uncommon for EPYC to have at least two nodes internally.

Anyway. Thanks guys for your input. I think I'll go with dual socket system, with fewer cores, but with higher frequencies.
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1373
Location: Richmond Hill, Canada

PostPosted: Sat Jan 11, 2025 8:41 pm    Post subject: Reply with quote

I am curious on modern PC how the linux kernel learn hardware configuration. In this case NUMA (Non-Uniform Memory Access) how did kernel know latency from each memory bank? does kernel perform testing during boot? Or does it depend on something external tell it ACPI/DT (device tree)?

In the conversation about building Gentoo system (emerge) assume on a NUMA machine will it be faster if you run multiple emerges at same time speed the entire process? assume you can partition your build set and utilize NUMA control to separate each emerge session? (Thinking VM/Container)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum