View previous topic :: View next topic |
Author |
Message |
curmudgeon Veteran
Joined: 08 Aug 2003 Posts: 1744
|
Posted: Sun Oct 21, 2018 10:19 pm Post subject: Can someone explain the new "multiarch" flag in gl |
|
|
In particular, does it apply only to "similar" architectures (such as amd64 and x86) or just about anything (say amd64 and arm64)?
Thank you in advance. |
|
Back to top |
|
|
jagdpanther l33t
Joined: 22 Nov 2003 Posts: 757
|
Posted: Mon Oct 22, 2018 2:47 pm Post subject: |
|
|
Yes, please explain. The only comment in /usr/portage/profiles is:
Quote: | use.local.desc:sys-libs/glibc:multiarch - enable single DSO with optimizations for multiple architectures |
Is this needed if your run 32-bit apps on your x86_64 system or am I thinking of multilib ? |
|
Back to top |
|
|
Perfect Gentleman Veteran
Joined: 18 May 2014 Posts: 1255
|
Posted: Mon Oct 22, 2018 3:16 pm Post subject: |
|
|
AFAIK, it means that glibc is optimized for different CPU archs, i.e. AMD's CPUs and Intel's CPUs. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Mon Oct 22, 2018 10:00 pm Post subject: |
|
|
Perfect Gentleman wrote: | AFAIK, it means that glibc is optimized for different CPU archs, i.e. AMD's CPUs and Intel's CPUs. | Then why isn't default OFF since the whole point of compiling everything is to optimize for a particular CPU?
Not challenging, I'd realy like a technical explanation. I can see this for a binary distro, it would be pretty much mandatory. |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Mon Oct 22, 2018 10:15 pm Post subject: |
|
|
Tony0945 wrote: | Perfect Gentleman wrote: | AFAIK, it means that glibc is optimized for different CPU archs, i.e. AMD's CPUs and Intel's CPUs. |
Then why isn't default OFF since the whole point of compiling everything is to optimize for a particular CPU? Not challenging, I'd realy like a technical explanation. I can see this for a binary distro, it would be pretty much mandatory. |
Tony0945 ... it's off because you already get optimisation for your target arch (so, singularly optimised), and would only need 'multiarch' if you needed a "single DSO optimi[sed] for multiple architectures". It's only, or mostly, a feature for bindists who distribute a "single DSO" that runs on multiple architectures.
best ... khay |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Mon Oct 22, 2018 11:09 pm Post subject: |
|
|
khayyam wrote: | Tony0945 ... it's off because you already get optimisation for your target arch (so, singularly optimised), and would only need 'multiarch' if you needed a "single DSO optimi[sed] for multiple architectures". It's only, or mostly, a feature for bindists who distribute a "single DSO" that runs on multiple architectures. |
Code: | IUSE="audit caps compile-locales doc gd hardened headers-only +multiarch multilib nscd profile selinux suid systemtap vanilla"
| It's default ON because of the '+'.
Is it OK to turn it off with package.use ?
And BTW, what does DSO mean? I don't think it means "Defense Secretary Office". |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Mon Oct 22, 2018 11:39 pm Post subject: |
|
|
khayyam wrote: | Tony0945 ... it's off because you already get optimisation for your target arch (so, singularly optimised), and would only need 'multiarch' if you needed a "single DSO optimi[sed] for multiple architectures". It's only, or mostly, a feature for bindists who distribute a "single DSO" that runs on multiple architectures. |
Tony0945 wrote: | Code: | IUSE="audit caps compile-locales doc gd hardened headers-only +multiarch multilib nscd profile selinux suid systemtap vanilla"
|
It's default ON because of the '+'. Is it OK to turn it off with package.use ? |
Tony0945 ... I totally misread your post, most probably because I honestly can't see a reason why it would be enabled by default. As for turning it off, I would say yes, but then I should probably offer caution, this is the era of the new and all that comes with it after all :)
Tony0945 wrote: | And BTW, what does DSO mean? I don't think it means "Defense Secretary Office". |
dynamic shared object.
best ... khay |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6148 Location: Dallas area
|
Posted: Mon Oct 22, 2018 11:45 pm Post subject: |
|
|
Not sure what it does, but it seems to be in the fpu area of the source code. Not sure if it's necessary or not.
ETA:
Quote: | What is this Multiarch?
Multiarch lets you install library packages from multiple architectures on the same machine. This is useful in various ways, but the most common is installing both 64 and 32-bit software on the same machine and having dependencies correctly resolved automatically. In general you can have libraries of more than one architecture installed together and applications from one architecture or another installed as alternatives. Note that it does not enable multiple architecture versions of applications to be installed simultaneously. |
https://wiki.debian.org/Multiarch/HOWTO
Maybe they're trying to fix the 64/32 bit jungle that they created. _________________ UM780, 6.1 zen kernel, gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22657
|
Posted: Tue Oct 23, 2018 1:39 am Post subject: |
|
|
I think the Debian multiarch is a different project. That one is intended to store architecture-specific libraries at paths that tell you the architecture. Historically, we had /usr/lib64/libNAME.so, and the architecture was simply understood to be "whatever 64-bit code the native system can run." However, there are multiple mutually incompatible 64-bit CPUs in existence (amd64, ppc64, arm64, etc.). Non-native 64-bit was consigned to a longer path, so the path to a library depended on whether it was native or a cross-compiled library. Multiarch proposes that you install the library as /usr/lib/x86_64-pc-linux-gnu/libNAME.so for amd64, so that all the 64-bit architecture's libraries can be co-installed as peers and not cause file collisions.
From looking at the glibc ebuild, it looks like its use of multiarch is intended to compile multiple copies of selected important functions, then runtime resolve to the best implementation for the current CPU. Assuming the build system is otherwise able to respect the local administrator's build flags, this seems unnecessary for Gentoo systems, except in the case that you build a package and install it on several different mostly-compatible CPUs (e.g. an Atom, an Intel Core i3, and an AMD Ryzen -- all in the x86 family, but each with their own quirks and possibly different "best" ways of doing a job). |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Tue Oct 23, 2018 9:49 am Post subject: |
|
|
Hu wrote: | Assuming the build system is otherwise able to respect the local administrator's build flags, this seems unnecessary for Gentoo systems, except in the case that you build a package and install it on several different mostly-compatible CPUs (e.g. an Atom, an Intel Core i3, and an AMD Ryzen -- all in the x86 family, but each with their own quirks and possibly different "best" ways of doing a job). |
I disagree with you there Hu, the same cpu could offer very different implementations of a function because the cpu handle mutli-code (here i'm speaking of mmx, sse...) that may optimize the function in some way.
And because of the cpu internal (architecture, cache size...), you "may" not bet without check of what implementation would work the best (ie: the function made with sse could work worst or better than the same function using just mmx)
This can "somehow" be seen already with mdraid that do these kind of tests to also pickup the best implementation to use on the cpu considering the codes the cpu could run.
(not from my dmesg, i don't use mdraid myself)
Code: | raid6: int32x1 869 MB/s
raid6: int32x2 927 MB/s
raid6: int32x4 676 MB/s
raid6: int32x8 643 MB/s
raid6: mmxx1 3071 MB/s
raid6: mmxx2 3413 MB/s
raid6: sse1x1 2033 MB/s
raid6: sse1x2 2573 MB/s
raid6: sse2x1 3710 MB/s
raid6: sse2x2 3909 MB/s
raid6: using algorithm sse2x2 (3909 MB/s)
xor: automatically using best checksumming function: pIII_sse
pIII_sse : 8767.200 MB/sec
xor: using function: pIII_sse (8767.200 MB/sec)
|
This still raise more questions (on how they implement this) but that is a different subject |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6148 Location: Dallas area
|
Posted: Tue Oct 23, 2018 10:08 am Post subject: |
|
|
There sure is a dearth of info on exactly what it is, and the fact that the term is used in seemingly different ways by different (linux) companies is discouraging.
But this seems to be a llittle clearer (not much though)
Quote: | The option `--enable-multi-arch` in glibc is about
enabling multiple architectures in the sense of IFUNC
support, but not `multiarch` (one word) in the sense
of gcc or the runtime.
Not to be confused with multilib, either in the sense
of multiple runtimes for gcc, or in the rpm sense of
multiple runtimes e.g. i386 and x86-64. |
https://sourceware.org/ml/libc-help/2014-12/msg00003.html
Makes me wonder what IFUNC support is though.
Note: In looking at the ebuild they use multiarch as a flag, but what's sent to configure is enable/disable-multi-arch (with the hyphen)
Edit to add: a quick search yields
Quote: | What is an indirect function (IFUNC)?
The GNU indirect function support (IFUNC) is a feature of the GNU toolchain that allows a developer to create multiple implementations of a given function and to select amongst them at runtime using a resolver function which is also written by the developer. The resolver function is called by the dynamic loader during early startup to resolve which of the implementations will be used by the application. Once an implementation choice is made it is fixed and may not be changed for the lifetime of the process. |
https://sourceware.org/glibc/wiki/GNU_IFUNC _________________ UM780, 6.1 zen kernel, gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
jagdpanther l33t
Joined: 22 Nov 2003 Posts: 757
|
Posted: Tue Oct 23, 2018 4:20 pm Post subject: |
|
|
I am still confused. For those of us who only build glibc for use on the system it is compiled (emerged) on and who are interested in execution speed (and to a lesser degree file size) should we turn off the multiarch use flag for glibc? (ie. -multiarch) |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6148 Location: Dallas area
|
Posted: Tue Oct 23, 2018 4:46 pm Post subject: |
|
|
jagdpanther wrote: | I am still confused. For those of us who only build glibc for use on the system it is compiled (emerged) on and who are interested in execution speed (and to a lesser degree file size) should we turn off the multiarch use flag for glibc? (ie. -multiarch) |
++ _________________ UM780, 6.1 zen kernel, gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22657
|
Posted: Wed Oct 24, 2018 2:06 am Post subject: |
|
|
krinn wrote: | Hu wrote: | Assuming the build system is otherwise able to respect the local administrator's build flags, this seems unnecessary for Gentoo systems, except in the case that you build a package and install it on several different mostly-compatible CPUs (e.g. an Atom, an Intel Core i3, and an AMD Ryzen -- all in the x86 family, but each with their own quirks and possibly different "best" ways of doing a job). |
I disagree with you there Hu, the same cpu could offer very different implementations of a function because the cpu handle mutli-code (here i'm speaking of mmx, sse...) that may optimize the function in some way.
And because of the cpu internal (architecture, cache size...), you "may" not bet without check of what implementation would work the best (ie: the function made with sse could work worst or better than the same function using just mmx) | I am a bit confused about your disagreement. Are you saying that a given physical CPU will vary the best implementation over time? My point above was that there are many x86 compatible CPUs, and they do not all excel at the same instructions, so different implementations work better for different groups. Your elaboration agrees with that: just because two CPUs both support mmx and sse, it does not follow that both of them will perform better using mmx than sse (or vice versa). It depends on what tradeoffs the CPU manufacturers made. Ideally, the compiler should know the best implementation for each CPU family and emit code accordingly. Indirect functions are a way to handle that the user's use case does not allow the compiler to pick a single "ideal" version, when the user wants to use a variety of CPUs that do not all agree on what is ideal. |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Wed Oct 24, 2018 4:04 am Post subject: |
|
|
Hu wrote: | I am a bit confused about your disagreement. Are you saying that a given physical CPU will vary the best implementation over time? |
With Intel's crazy microcode patching as of late, it's not impossible... |
|
Back to top |
|
|
geki Advocate
Joined: 13 May 2004 Posts: 2387 Location: Germania
|
Posted: Wed Oct 24, 2018 5:29 am Post subject: |
|
|
krinn wrote: | Code: | raid6: int32x1 869 MB/s
raid6: int32x2 927 MB/s
raid6: int32x4 676 MB/s
raid6: int32x8 643 MB/s
raid6: mmxx1 3071 MB/s
raid6: mmxx2 3413 MB/s
raid6: sse1x1 2033 MB/s
raid6: sse1x2 2573 MB/s
raid6: sse2x1 3710 MB/s
raid6: sse2x2 3909 MB/s
raid6: using algorithm sse2x2 (3909 MB/s)
xor: automatically using best checksumming function: pIII_sse
pIII_sse : 8767.200 MB/sec
xor: using function: pIII_sse (8767.200 MB/sec)
|
This still raise more questions (on how they implement this) but that is a different subject | since you ask, something easy like this (select and/or switch implementation at runtime):
https://forums.gentoo.org/viewtopic-p-8206816.html#8206816 _________________ hear hear |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Wed Oct 24, 2018 9:30 am Post subject: |
|
|
Hu: my disagreement was with Quote: | Assuming the build system is otherwise able to respect the local administrator's build flags, this seems unnecessary for Gentoo systems |
As (i think) you were assuming it would be useful only in gentoo for case where user will run that code on another cpu.
While i was pointing it would still be useful because even portage and the user (thru cflags & X86_FLAGS) will hint the program about cpu capabilities, the user himself cannot know what implementation would do best.
geki:
I think the projects aim share a common ground, but not common goal.
The first should only build function versions the cpu could handle, in order to pickup the best at runtime
The second should build every versions of a function that was implemented in order to run in every cpu ; as such will add unwanted bloated in a gentoo system (you don't need the function version optimize for sse2 on a cpu that cannot handle sse2) ; but someone doing that FMV, could also be smart and add some speed optimization detection of what function to use on a cpu capable of running multi-code implementation (like a cpu able to run mmx and sse).
What i was thinking behind my "it raise more questions" was more about implementation of that, as a function may run better depending on the given arguments pass to it ; and it could be seen easy as everyone could safely assume sse should run better than mmx, assuming sse win would be bad in practice (that's the same bad assumption made to assume gcc -O3 will always gives better result than -O2, in theory it is, in practice, it's not).
And my second question i had in mind was already answered by Anon-E-moose quote in IFUNC, as i was worried about implementation of testing with the Once an implementation choice is made it is fixed and may not be changed for the lifetime of the process
Because if you are making test each time the function is called, the test will draw any benefits from their results (you test if mmx or sse is better, but it has a speed cost doing the testing, voiding the speed gain from branching to the best function).
Result of this (for me), is that it looks good on paper, but i'm less sure it do good finally. Testing if a function run faster or slower in mmx or see on a busy cpu may not gives the best answer for the given cpu, while people try to run benchmark on a non busy cpu to not disturb the test, glibc will run its tests when the function is called, with random results in practice as the cpu may be working hard doing something else ; with the end result that glibc will say "mmx version run faster", while in real, the sse one will always do better, but not at the time the tests were made. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Wed Oct 24, 2018 2:38 pm Post subject: |
|
|
Wait wait! So the loader can determine the best implementation of a function, but the compiler can't?
I find that hard to believe. Why does the compiler attempt optimization at all?
As for Intel microcode, all my machines are AMD, and "the crazy Intel microcode" encourages me to NOT buy an Intel in the future.
Last edited by Tony0945 on Wed Oct 24, 2018 9:20 pm; edited 1 time in total |
|
Back to top |
|
|
geki Advocate
Joined: 13 May 2004 Posts: 2387 Location: Germania
|
Posted: Wed Oct 24, 2018 5:01 pm Post subject: |
|
|
krinn wrote: | I think the projects aim share a common ground, but not common goal.
The first should only build function versions the cpu could handle, in order to pickup the best at runtime
The second should build every versions of a function that was implemented in order to run in every cpu ; as such will add unwanted bloated in a gentoo system (you don't need the function version optimize for sse2 on a cpu that cannot handle sse2) ; but someone doing that FMV, could also be smart and add some speed optimization detection of what function to use on a cpu capable of running multi-code implementation (like a cpu able to run mmx and sse). | yes, vectorclass supports function selection by arch, i.e. for x86 or x86_64. so it is possible to build one binary supporting complete intel and amd cpu instruction set for respective arch. I use binhost with clients. so I like that feature enabling more cpu features on that clients than on binhost. FMV seems to support rather library selection for different archs in one binary or dso, dynamically loading arch dependent dso. is it possible to build a binary, which can be executed on x86, x86_64, arm, arm64. ppc*? if not FMV seems to be superfluous to me. if you use FMV for mmx versus sse switcheroo - thats simply wrong.
well, answer held generic - not specific to krinn. I still try to classify this feature. _________________ hear hear |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Wed Oct 24, 2018 10:42 pm Post subject: |
|
|
Code: | # equery d glibc
* These packages depend on glibc:
dev-java/oracle-jre-bin-1.8.0.162-r1 (!prefix ? sys-libs/glibc)
dev-libs/libev-4.24 (elibc_glibc ? >=sys-libs/glibc-2.9_p20081201)
net-fs/autofs-5.1.4 (elibc_glibc ? sys-libs/glibc[rpc(-)])
sys-apps/iproute2-4.18.0 (elibc_glibc ? >=sys-libs/glibc-2.7)
sys-devel/gcc-6.4.0-r4 (elibc_glibc ? >=sys-libs/glibc-2.13)
sys-devel/gcc-7.3.0-r5 (elibc_glibc ? >=sys-libs/glibc-2.13)
sys-devel/gcc-8.2.0-r3 (elibc_glibc ? >=sys-libs/glibc-2.13)
sys-libs/tevent-0.9.37 (elibc_glibc ? <sys-libs/glibc-2.26[rpc(+)])
virtual/libc-1 (elibc_glibc ? sys-libs/glibc:2.2)
www-plugins/adobe-flash-31.0.0.122 (nsplugin ? >=sys-libs/glibc-2.4)
| Seems like this would help only java and adobe-flash.
Re-emerging glibc with -multiarch |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22657
|
Posted: Thu Oct 25, 2018 1:53 am Post subject: |
|
|
Tony0945 wrote: | Wait wait! So the loader can determine the best implementation of a function, but the compiler can't?
I find that hard to believe. Why does the compiler attempt optimization at all? | The loader, by definition, runs on the system where the program is loaded for execution. The compiler may or may not. The loader can inspect the model identifying data of the host CPU, cross-check that against hardcoded rules for what is best on that family, and pick an implementation accordingly. The compiler can know what is best for each family, but it can't know on which family you will ultimately run the code. (Using -march gives it strong hints, but even that is only establishing a partial bound. You are permitted to run on a CPU that is substantially better than the one specified via -march.) The compiler optimizes as best it can with the information available. In the general case, it cannot even be sure that you will run the code only on one family of CPU. You might upgrade the CPU or migrate the hard drive, and not recompile the code. On the other hand, if you migrate to a new CPU, you will necessarily reboot, and when you do, the loader gets a chance to pick the best variant of the code for your current CPU, ignoring what CPU family you had when you compiled the code. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Thu Oct 25, 2018 2:04 am Post subject: |
|
|
Thank you, Hu. I understand how multiarch can be usefull in many cases, especially when running generic binary code.
In my use case, and I'm sure many others, I compile with -march=native, letting gcc figure out the best code and when I upgrade to a new CPU, I run "emerge -e @world" still with -march=native.
Edit: I notice after recompiling glibc (with -multiarch) and recompiling Thunderbird, that Thunderbird loads much faster without the artifacts that it previous displayed that I attributed to T-bird itself. |
|
Back to top |
|
|
The Main Man Veteran
Joined: 27 Nov 2014 Posts: 1171 Location: /run/user/1000
|
Posted: Thu Oct 25, 2018 9:09 am Post subject: |
|
|
Running steam for example would require glibc multiarch.
Actually any x86 code running on amd64
I guess at least |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Thu Oct 25, 2018 2:28 pm Post subject: |
|
|
kajzer wrote: | Running steam for example would require glibc multiarch.
Actually any x86 code running on amd64
I guess at least |
Wouldn't that be the multilib flag not multiarch ? |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6148 Location: Dallas area
|
Posted: Thu Oct 25, 2018 3:00 pm Post subject: |
|
|
Tony0945 wrote: | Thank you, Hu. I understand how multiarch can be usefull in many cases, especially when running generic binary code.
In my use case, and I'm sure many others, I compile with -march=native, letting gcc figure out the best code and when I upgrade to a new CPU, I run "emerge -e @world" still with -march=native.
Edit: I notice after recompiling glibc (with -multiarch) and recompiling Thunderbird, that Thunderbird loads much faster without the artifacts that it previous displayed that I attributed to T-bird itself. |
I didn't notice any artifacts (but I'm running an older tbird anyway) but I did go ahead and recompile glibc without multiarch.
I don't see a need for it, at least on my system. _________________ UM780, 6.1 zen kernel, gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|