View previous topic :: View next topic |
Author |
Message |
mv Watchman
Joined: 20 Apr 2005 Posts: 6780
|
Posted: Fri Dec 08, 2017 1:59 pm Post subject: |
|
|
NeddySeagoon wrote: | What about adding -fpie [...]
Is that the right flag though, I think there is also a -fPIE? |
According to the man page there is only on certain architectures a difference between these two flags; on x86 and amd64 they should do the same thing.
Quote: | Where its an internal gcc setting, that's not possible. |
It is possible if the package explicitly adds -fno-pie. |
|
Back to top |
|
|
Ralphred l33t
Joined: 31 Dec 2013 Posts: 653
|
Posted: Sat Dec 09, 2017 6:29 am Post subject: |
|
|
I've read this entire thread with interest, and would like some clarification on things I think I don't have to do.
The two slowest machines I have both run with a "hardened" profile, as I understand it they already have pie set as a use flag, changing the profile to a 17/hardened version shouldn't require any rebuilds beyond the usual emerge -uDNav world, is this assumption correct? |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6780
|
Posted: Sat Dec 09, 2017 6:40 am Post subject: |
|
|
Ralphred wrote: | "hardened" profile [...] changing the profile to a 17/hardened version shouldn't require any rebuilds [...] is this assumption correct? |
Concerning pie it is correct. Concerning ssp, I am not absolutely sure: There were times when hardened used a different ssp mechanism than gcc upstream. To be on the safe side, I would rebuild static libs anyway. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat Dec 09, 2017 9:47 am Post subject: |
|
|
In case people didn't read the article of someone testing building SPEC2006 benchmarks with and without PIE on x86 32-bit, these results are probably most notable as likely we're using these directly (instead of having to infer from the benchmarks)
gcc: around 8.5% hit (benchmark compiles C to x86_64 assembly)
h264: around 9% hit (benchmark encodes frames to video stream, does not decompress?)
perlbench: around 22% hit (perl interpreter, biggest part is running spamassassin and converting email to html)
bzip2: around 16-20% hit (runs both compress and decompress within benchmark)
Note that some are better with -O2 and some are better than -O3 which makes it hard to choose which to use to minimize performance hit. Also there may be other benchmarks that may be relevant but might well just read the whole article...
I wonder how much perlbench relates to python and thus portage. One thing is that SPECcpu does not test disk i/o so it may not completely tank portage speeds 20% if perl and python are comparable. I'd imagine on a HDD we probably won't see the whole 20% hit if they are comparable, but if everything is cached, 20% on something that takes a minute would take another 12 seconds to complete. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5109 Location: Bavaria
|
Posted: Sat Dec 09, 2017 10:14 am Post subject: |
|
|
I switched to the new 17.0 profile. Before I did an which showed me only three packages using the pie:
-gcc
-pam
-open-ssh
These packages will be updated/recompiled also with an
instead of "emerge -e @world". Until now I have no troubles ...
Last edited by pietinger on Sat Dec 09, 2017 4:41 pm; edited 1 time in total |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat Dec 09, 2017 11:07 am Post subject: |
|
|
If we do end up forcing USE=-pie as our workaround for x86 boxes (which I am very strongly leaning towards now since we're not just talking about 1-4% penalties) -- how much Gentoo-related breakage can we expect down the road?
The other concern I have is that I may go ahead and let all my x86_64 boxes build USE=pie but distcc will once again break until all machines will build PIE. And the 32-bit crossdev gccs will need to be built USE=-pie ... oh misery even if it is directly supported...
I just set up my first x86_64 machine with straight 17.0, and about to hit the button to go do it (emerge --update --pretend is clean, so I hope emerge --emptytree should also be clean), but still concerned about what I'll do with my 32-bit machines... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6780
|
Posted: Sat Dec 09, 2017 11:57 am Post subject: |
|
|
eccerr0r wrote: | In case people didn't read the article of someone testing |
Your quest on spreading FUD continues.
Now with misinformation.
I just examined the most astonishing one:
Quote: | bzip2: around 16-20% hit |
Whoever claimed this was either intentionally lying or measured something else than he was claiming:
bzip2 is nothing else but a wrapper which calls a (PIC-compiled!) library, so the code for compression/decompressoin is not influenced by pie at all.
OK, let us assume that the guy knew this and compiled with static libs for testing purposes.
Let's see whether in this case his number might be correct:
I benchmarked on x86, by compressing and decompressing a huge file 9 times on a ramdisk and sorting the times for simpler comparison:
Code: | dynamic pie: 6.070 6.154 6.180 6.180 6.189 6.192 6.193 6.215 6.233
static +pie: 6.189 6.199 6.238 6.204 6.199 6.201 6.182 6.196 6.182
static -pie: 6.179 6.185 6.192 6.209 6.216 6.221 6.235 6.236 6.247 |
Summary: The difference lies below a measurable quantity; in about half of the cases +pie appears to be even faster (but of course this is just noise from measuring). |
|
Back to top |
|
|
*zensiert* n00b
Joined: 10 Oct 2004 Posts: 55 Location: Germany
|
Posted: Sat Dec 09, 2017 2:58 pm Post subject: |
|
|
wjb wrote: | dmpogo wrote: |
...
Code: |
emerge -1 /lib*/*.a /usr/lib*/*.a
|
does it miss something, besides kernel and third party kernel modules ? |
It misses a lot - the libraries in /usr/lib* can be nested quite deep, e.g. dev-python/numpy needs something like /usr/lib*/*/*/*/*/*/*.a.
On my system it finds about a third of the packages with static libs. |
How about this:
Code: |
emerge -p `find /lib* /usr/lib* -name \*\.a`
|
This should find all .a-files, however deeply they are nested. You could even search in / to include other possible files, e.g. under /opt |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6780
|
Posted: Sat Dec 09, 2017 3:45 pm Post subject: |
|
|
*zensiert* wrote: | How about this: Code: | emerge -p `find /lib* /usr/lib* -name \*\.a` |
This should find all .a-files, however deeply they are nested. You could even search in / to include other possible files, e.g. under /opt |
If you use a sane shell like zsh you can also do more quickly
Code: | emerge -1a /lib*/**.a /usr/lib*/**.a |
However, the reason why I did not recommend this is that the deeper nested libraries are usually part of some toolchain (gcc, glibc, clang, ...) which one usually does not want to reemerge.
But it is correct that one should check also subdirectories to find exceptions. In my case, numpy and wine are the only exceptions (and I am wondering why a python package installs a static library under the python tree and whether this will actually be every used by other packages; probably, the wine static libs are used by nothing in the gentoo tree.) |
|
Back to top |
|
|
dmpogo Advocate
Joined: 02 Sep 2004 Posts: 3425 Location: Canada
|
Posted: Sat Dec 09, 2017 4:39 pm Post subject: |
|
|
mv wrote: | eccerr0r wrote: | In case people didn't read the article of someone testing |
Your quest on spreading FUD continues.
Now with misinformation.
I just examined the most astonishing one:
Quote: | bzip2: around 16-20% hit |
Whoever claimed this was either intentionally lying or measured something else than he was claiming:
|
Well, eccer0r refers to a specific paper quoted several posts above, let me give the link again
https://nebelwelt.net/publications/files/12TRpie.pdf
It is not a very recent analysis (2012) and maybe things have changed, but the there is no need to guess what was and was not
done. If you can point the flaws of their analysis, it will be interesting. |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1845
|
Posted: Sat Dec 09, 2017 5:44 pm Post subject: |
|
|
eccerr0r wrote: | If we do end up forcing USE=-pie as our workaround for x86 boxes (which I am very strongly leaning towards now since we're not just talking about 1-4% penalties) -- how much Gentoo-related breakage can we expect down the road? | This is what I'd like to know. All I've really read for sure is that a mix of pie and -pie is bound to cause problems, but nothing's been clarified as to having -pie globally. Even if there was no performance hit at all, it certainly appears that I'll gain nothing I need or want otherwise by enabling pie on x86. Gentoo is supposed to be about choices, and this one's getting old in a big hurry frankly. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat Dec 09, 2017 6:31 pm Post subject: |
|
|
The only thing that may have changed is the version of gcc since then. Maybe gcc can hide the PIC penalty.
In case people don't know about SPEC, these are canned code sequences used to benchmark across platforms - it does NOT call the system binaries, they're forked from what was available at the time. They are also statically linked - there is no library shenanigans going on.
Unless their analysis methodology was wrong, or you want to compare real world differences (i.e. using dynamic libraries or include disk io penalties), their analysis is indeed credible unless gcc was made better since then. I kind of doubt it. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
dmpogo Advocate
Joined: 02 Sep 2004 Posts: 3425 Location: Canada
|
Posted: Sat Dec 09, 2017 6:49 pm Post subject: |
|
|
eccerr0r wrote: | The only thing that may have changed is the version of gcc since then. Maybe gcc can hide the PIC penalty.
In case people don't know about SPEC, these are canned code sequences used to benchmark across platforms - it does NOT call the system binaries, they're forked from what was available at the time. They are also statically linked - there is no library shenanigans going on.
Unless their analysis methodology was wrong, or you want to compare real world differences (i.e. using dynamic libraries or include disk io penalties), their analysis is indeed credible unless gcc was made better since then. I kind of doubt it. |
With gcc they state that they used gcc-4.5.2-8ubuntu. One actually interesting remark they have is (referring to some other paper 11)
Quote: |
According to [11] the overhead for PIE in I/O based benchmarks is between 0%
and 10%. Unfortunately we were unable to reproduce the re-
sults of [11] for bzip2 (0% overhead on their system) on our
system.
|
The also refer to Ubuntu analysis, which currently states the following at https://wiki.ubuntu.com/Security/Features#pie
Quote: |
... PIE has a large (5-10%) performance penalty on architectures with small numbers of general registers (e.g. x86), so it initially was only used for a select number of security-critical packages (some upstreams natively support building with PIE, other require the use of "hardening-wrapper" to force on the correct compiler and linker flags). PIE on 64-bit architectures do not have the same penalties, and it was made the default (as of 16.10, it is the default on amd64, ppc64el and s390x). As of 17.10, it was decided that the security benefits are significant enough that PIE is now enabled across all architectures in the Ubuntu archive by default.
|
Last edited by dmpogo on Sat Dec 09, 2017 7:28 pm; edited 1 time in total |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22657
|
Posted: Sat Dec 09, 2017 7:02 pm Post subject: |
|
|
NeddySeagoon wrote: | What about adding -fpie to CFLAGS on the client?
That would pass -fpie no the helpers and pie code would arrive back.
Is that the right flag though, I think there is also a -fPIE?
I can see problems with those packages that set their own CFLAGS. The -fpie might get dropped.
Where its an internal gcc setting, that's not possible. | That might work, but between the valid concerns you raise here and a problem in another thread where an explicit -no-pie confused ld into breaking all shared libraries, I'd be rather wary of what side effects might come from a global -fpie. There's also the concern of packages that respect CFLAGS over their own built-in options, have a known incompatibility with PIE, and include an explicit -fno-pie in their Makefile equivalent. If they pass user CFLAGS right of package CFLAGS, a gcc-specs driven pie (as profile 17.0 does) would be overridden by the package's -fno-pie, but a user CFLAGS would not.
Incidentally, if you went this route, you would also need to do it in CXXFLAGS. |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6780
|
Posted: Sat Dec 09, 2017 7:42 pm Post subject: |
|
|
eccerr0r wrote: | their analysis is indeed credible |
Sounds fair. I didn't know about it and had just seen your numbers which I could not reproduce at all.
Quote: | unless gcc was made better since then |
There is a huge difference between gcc-4 and gcc-7 (which I used), especially if one just uses generic options like -O2 since a lot of what was previously -O3 is now contained in -O2.
Mainly, I believe that it is the improvement in hardware (or in-processor microcode) which makes the difference to the cited benchmarks: I could well imagine that internally within e.g. an amd64 the remaining registers are used in x86 mode as well (as some sort of 0-level cache), and why shouldn't something similar happen on modern x86 systems?
This is the whole thing with pie: It is practically impossible to measure, because slight changes in the code, code optimization, or hardware might have almost unpredictable effects. Everything which pie costs is a single register, so you only loose something if the code was by accident such that almost never a reloading of registers is necessary and with one register less you have to reload registers a actually more often.
And even then, there is 1st level cache which buffers most cases, and a wrong branch prediction is still much more costly than any of the other effects.
It might by accident happen as well that using pie gains you time instead of losing time (if the variables actually stored in less register turn out to be a better selection) - only in the average it is less likely.
So I am not so surprised to see that they refer to another benchmark on another (x86) system where pie did not increase the execution time of bzip2 at all (which corresponds to the results I had with my tiny benchmark). |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat Dec 09, 2017 7:47 pm Post subject: |
|
|
dmpogo wrote: | One actually interesting remark they have is (referring to some other paper 11)
Quote: |
According to [11] the overhead for PIE in I/O based benchmarks is between 0%
and 10%. Unfortunately we were unable to reproduce the re-
sults of [11] for bzip2 (0% overhead on their system) on our
system.
|
|
Reference [11] can be seen at https://air.unimi.it/retrieve/handle/2434/139336/113600/acsac09.pdf ... According to this, they are using system binaries with recompilation, and shared libraries; so these side effects will affect their numbers. However they were more focused on the security aspect over quantifying exact performance degradation. On a side note, I'm especially worried about their benchmark 'bc', if this is the same bc as in the arbitrary precision calculator, as this too is an interpreted language though we won't know what actual code they used (it could just be running a long running builtin function many times). Unlike SPEC, it's not clear what each benchmark is along with what version they used.
In any case, I would imagine that IO should drown out a large part of the losses from PIE as IO now depends on kernel code, and it'll be data starved during this time. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat Dec 09, 2017 7:57 pm Post subject: |
|
|
mv wrote: | There is a huge difference between gcc-4 and gcc-7 (which I used), especially if one just uses generic options like -O2 since a lot of what was previously -O3 is now contained in -O2.
|
Now you're spreading something that isn't quantified. How much better is "huge" really? As I've been dealing with avrgcc the only thing that I can guarantee has changed between gcc versions is that resultant binary sizes always gets larger and always a struggle to fit within 4K of flash.
As witnessed by the first paper, sometimes -O3 is worse than -O2, so there's no guarantee these optimizations actually improve runtime. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sat Dec 09, 2017 8:06 pm Post subject: |
|
|
tld,
Predictions, especially about the future, are very difficult.
Right no, as long as pie in consistent, nothing breaks.
We have gcc-6.4 with and without pie.
If you want -pie, go for it but on the /17.x/ profile. 17.1 is in the wings too.
Going forward, nothing will be tested or supported -pie. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sat Dec 09, 2017 8:12 pm Post subject: |
|
|
NeddySeagoon wrote: |
Going forward, nothing will be tested or supported -pie. | So 32-bit is on it's own? |
|
Back to top |
|
|
asturm Developer
Joined: 05 Apr 2007 Posts: 9280
|
Posted: Sat Dec 09, 2017 8:19 pm Post subject: |
|
|
No. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3701 Location: Rasi, Finland
|
Posted: Sat Dec 09, 2017 10:15 pm Post subject: |
|
|
x86 32-bit with -pie will be on its own? _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sat Dec 09, 2017 10:22 pm Post subject: |
|
|
Tony0945,
Testing will be with +pie as its done on Gentoo stable systems.
They will be using the /17.0/ profile, not the /17.0/ profile with -pie. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sat Dec 09, 2017 11:08 pm Post subject: |
|
|
Zucca wrote: | x86 32-bit with -pie will be on its own? |
Seems so. Well, all that box needs is the ethernet driver, and iptables anyhow. I'll just stop updating it. I left it for a year and half anyway before bringing it up about two months ago.
TIP: If updating an old box, copy the portage ebuild and update it first. Then do whatever it needs. Then sync and do the big update. Tar up the old /usr/portage first in case you need something from it. you probably will. |
|
Back to top |
|
|
asturm Developer
Joined: 05 Apr 2007 Posts: 9280
|
Posted: Sun Dec 10, 2017 12:20 am Post subject: |
|
|
Completely unnecessary since all the ebuilds are recorded in git. |
|
Back to top |
|
|
Tony0945 Watchman
Joined: 25 Jul 2006 Posts: 5127 Location: Illinois, USA
|
Posted: Sun Dec 10, 2017 2:15 am Post subject: |
|
|
asturm wrote: | Completely unnecessary since all the ebuilds are recorded in git. | You can retrieve the ebuilds but the supplementary files are another story. |
|
Back to top |
|
|
|