View previous topic :: View next topic |
Author |
Message |
antonellocaroli Guru

Joined: 11 Aug 2016 Posts: 513
|
Posted: Sun Aug 25, 2019 6:45 am Post subject: CFLAGS Rpi3B Rpi3B+ Rpi4 |
|
|
Hi,
I use for these three boards always
CFLAGS="-march=native -O2 -pipe"
Is it okay or better to use anything else? If so, which ones? |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 54975 Location: 56N 3W
|
Posted: Sun Aug 25, 2019 8:56 am Post subject: |
|
|
antonellocaroli,
The answer depends if you use 32 bit or 64 bit.
In 64 bit mode on the Pi3B and 3B+ I use
Code: | CFLAGS="-mcpu=cortex-a53+crc -mtune=cortex-a53 -ftree-vectorize -O2 -pipe -fomit-frame-pointer"
CXXFLAGS="${CFLAGS}" |
and gcc says that -march=native means
Code: | $ gcc -### -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )|( -mno-[^\ ]+)//g'
-mlittle-endian -mabi=lp64 -march=armv8-a+crc |
-mtune=cortex-a53 says to match the instruction steam ordering to best suit the A53 64 bit instruction set used on the Pi3B/3B+
On the Pi4 the -mtune=cortex-a53 becomes -mtune=cortex-a72, to suit its CPU.
The same instruction set is used and the code runs on both CPUs. The A53 is an in order machine but the A72 can execute instructions out of order, so can gain a small speed improvement.
In 32 bit mode, The Pi3/4 present 32 bit CPUs, so you get different results. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
antonellocaroli Guru

Joined: 11 Aug 2016 Posts: 513
|
Posted: Sun Aug 25, 2019 9:05 am Post subject: |
|
|
Thank you, Neddy,
now if I change the cflags, should I recompile the entire system including the gcc?
Yes, I use a 64bit system. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 54975 Location: 56N 3W
|
Posted: Sun Aug 25, 2019 9:14 am Post subject: |
|
|
NeddySeagoon,
On the Pi3, No. I suspect nothing changes, since in order execution is probably the default.
On the Pi4, probably not. The speed gains will be marginal.
The instructions will not change, only the order in which they are issued. Keep an eye on 64 Bit Raspberry Pi 4B Benchmarks _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
antonellocaroli Guru

Joined: 11 Aug 2016 Posts: 513
|
Posted: Fri Oct 25, 2019 4:48 am Post subject: |
|
|
Hi NeddySeagoon,
but on 32-bit systems what do you use as cflags?
for both rpi3 and rpi4? |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
|
Back to top |
|
 |
antonellocaroli Guru

Joined: 11 Aug 2016 Posts: 513
|
Posted: Fri Oct 25, 2019 10:12 am Post subject: |
|
|
if I compile on Rpi4 with these cflags
CFLAGS="-mcpu=cortex-a53+crc -mtune=cortex-a53 -ftree-vectorize -O2 -pipe -fomit-frame-pointer"
or
CFLAGS="-mcpu=cortex-a53+crc -mtune=cortex-a53 -ftree-vectorize -O3 -pipe -fomit-frame-pointer"
Can I use the same stage on rpi3?
always referring to 32 bits... |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Fri Oct 25, 2019 12:13 pm Post subject: |
|
|
In attempting to run at maximum speeds with multithreading, for 32 bit floating point programs up to gcc 8, I use:
Code: |
gcc stressmpfpu.c -lm -lrt -O3 -lpthread -mcpu=cortex-a7 -mfloat-abi=hard -mfpu=neon-vfpv4 -funsafe-math-optimizations -o MP-FPUStress
|
and for integers:
Code: |
gcc stressmpint.c -lrt -lc -lm -O3 -lpthread -o MP-IntStress
|
The floating point program obtains 21.2 GFLOPS. Leaving out that funsave option (that seems to force the use of NEON SIMD instructions), maximum speed was 7.7 GFLOPS. With funsafe and -O2, the result indicated 11.0 GFLOPS.
I note that clang with Gentoo also has a funsave option that makes some difference. _________________ Regards
Roy |
|
Back to top |
|
 |
antonellocaroli Guru

Joined: 11 Aug 2016 Posts: 513
|
Posted: Fri Oct 25, 2019 5:53 pm Post subject: |
|
|
my question comes from the fact that the rpi4 is more performing in the compilation of the 3.
I generally produce
a system for:
rpi3 32 bit
rpi3 64 bit
rpi4 32bit
rpi4 64 bit
I would like to produce the stage4 related to the 4 systems all on the rpi4 (if possible?)
changing the cflags in make.conf
for rpi4 64 bit use
CFLAGS="-mcpu=cortex-a53+crc -mtune=cortex-a72 -ftree-vectorize -O2 -pipe -fomit-frame-pointer"
could I know precisely the 4 cflags in make.conf how they should be?
I am not an expert in this field. |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Fri Oct 25, 2019 8:06 pm Post subject: |
|
|
As you can see, I just use simple gcc commands from Terminal, easy to change for trying different options, but they are for short self contained programs.
I should have quoted 64 bit gcc 9 results. These are from the same 64 bit compilation, running on 4B, 3B+ and 3B
Code: |
gcc stressmpfpu64.c -lm -lrt -O3 -lpthread -march=armv8-a -no-pie -o MP-FPUSPiStress64a
Max GFLOPS Pi 4B 23.2, Pi3B+ 13.0, Pi3B 11.1
gcc stressmpfpu64.c -lm -lrt -O3 -lpthread -no-pie -o MP-FPUStress64b
Max GFLOPS Pi 4B 22.9, Pi3B+ 13.1, Pi3B 11.1
gcc stressmpfpu64.c -lm -lrt -O2 -lpthread -march=armv8-a -no-pie -o MP-FPUStress64c
Max GFLOPS Pi 4B 10.3, Pi3B+ 3.3, Pi3B 2.8
-no-pie needed to display coloured execution icon
|
32 bit programs use different instructions, with alternative compile options. I don’t know how these can be combined for the Pi systems.
I produced my Android Apps via Eclipse. This permits compiling for different systems, selected at run time. This includes the following to provide separate code for four different systems:
Code: |
Application.mk
# Build for ARM and Intel 32 bit and 64 bit systems
APP_ABI := armeabi-v7a arm64-v8a x86 x86_64
|
Separate CFLAGS can be included in Android.mk. This one also includes two more options for MIPS CPU technology.
Code: |
LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_MODULE := mpbus2ilib
# LOCAL_CFLAGS := -save-temps
ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)
LOCAL_CFLAGS += -DHAVE_NEON=1
LOCAL_SRC_FILES := busspdmp2i.c arm732.c
endif
ifeq ($(TARGET_ARCH_ABI),arm64-v8a)
LOCAL_CFLAGS += -DHAVE_NEON64=1
LOCAL_SRC_FILES = busspdmp2i.c arm864.c
endif
ifeq ($(TARGET_ARCH_ABI),x86)
LOCAL_CFLAGS += -ffast-math -mtune=atom -mssse3 -mfpmath=sse
LOCAL_SRC_FILES = busspdmp2i.c intel32.c
endif
ifeq ($(TARGET_ARCH_ABI),x86_64)
LOCAL_CFLAGS += -ffast-math -mtune=slm -msse4.2
LOCAL_SRC_FILES = busspdmp2i.c intel64.c
endif
ifeq ($(TARGET_ARCH_ABI),mips)
LOCAL_SRC_FILES = busspdmp2i.c mips32.c
endif
ifeq ($(TARGET_ARCH_ABI),mips64)
LOCAL_SRC_FILES = busspdmp2i.c mips64.c
endif
include $(BUILD_SHARED_LIBRARY)
|
Someone might know if such facilities are available for Raspberry Pi (running Android?). _________________ Regards
Roy |
|
Back to top |
|
 |
|