View previous topic :: View next topic |
Author |
Message |
Sakaki Guru
Joined: 21 May 2014 Posts: 409
|
Posted: Thu Feb 16, 2017 9:22 pm Post subject: |
|
|
R0b0t1,
following on from what NeddySeagoon just said, what do you get on your SD-card image if you run lsmod? Check the vc4 module is loaded, here's the output from my gentoo-on-rpi3-64bit image, for example: Code: | pi64 ~ # lsmod
Module Size Used by
configs 49152 0
cmac 16384 1
rfcomm 49152 12
hci_uart 32768 1
btbcm 16384 1 hci_uart
bnep 24576 2
bluetooth 397312 37 hci_uart,bnep,btbcm,rfcomm
ipv6 466944 26
brcmfmac 262144 0
vc4 139264 3
brcmutil 20480 1 brcmfmac
cfg80211 667648 1 brcmfmac
drm_kms_helper 204800 2 vc4
drm 454656 6 vc4,drm_kms_helper
rfkill 32768 5 bluetooth,cfg80211
snd_bcm2835 36864 1
joydev 20480 0
evdev 24576 3
snd_pcm 135168 1 snd_bcm2835
syscopyarea 16384 1 drm_kms_helper
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
fb_sys_fops 16384 1 drm_kms_helper
snd_timer 36864 1 snd_pcm
snd 102400 5 snd_timer,snd_bcm2835,snd_pcm
uio_pdrv_genirq 16384 0
uio 24576 1 uio_pdrv_genirq
pi64 ~ # uname -r
4.10.0-rc5-v8 |
You could also try running "vblank_mode=0 glxgears -info" to see that your Mesa etc. is correctly plumbed in...
To get the accelerated desktop, you need an appropriately configured kernel (& the necessary kernel modules loaded), an appropriately configured Mesa, and appropriately configured X11. See this wiki page for example.
You can look at the /etc/portage/make.conf, /etc/portage/package.use/... on the above image for some pointers too. _________________ Regards,
sakaki |
|
Back to top |
|
|
R0b0t1 Apprentice
Joined: 05 Jun 2008 Posts: 264
|
Posted: Thu Feb 16, 2017 10:12 pm Post subject: |
|
|
Thanks NeddySeagoon,
When I switched kernels I left Sakaki's config.txt intact. I checked to make sure that the dtoverlay line for the VC4 firmware was there before posting. Please see the other thread I started if you still feel like helping, I don't want to clutter this one.
Sakaki, I will attempt later and edit this post. Thanks. |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Feb 23, 2017 4:54 pm Post subject: 64 Bit Benchmarks |
|
|
Latest programs converted were my Fast Fourier Transform benchmarks that showed some 64 bit performance improvements. Details, results, source code and execution files can be obtained clicking on the links given earlier or via the www button below.
These execute FFTs sized 1K to 1024K, the larger ones depending on RAM speeds. Using Raspbian (32 bit) and Linux/RPi (64 bit), the short FFTs, with execution times of less than 0.5 milliseconds, produced inconsistent running times. This was only with "on demand" MHz settings and not when running another CPU benchmark at the same time, using a different core, or with a “performance” MHz setting. I haven’t found how to set “performance with Gentoo. Is it possible?
To investigate this, I produced another test that executes 30 1K sized FFTs 500 times, with 32 bit and 64 bit compilations (These are included in the tar.gz file). Example results are below.
Code: |
RPi 3 500 x 30 1K Single Precision FFT milliseconds
32 Bit Raspbian On Demand
12.9 12.2 7.4 6.0 6.0 6.4 6.0 6.0 6.0 6.0
6.1 6.0 6.0 6.0 6.0 6.0 6.1 6.1 6.0 6.2
6.2 6.0 6.0 6.1 6.0 6.0 6.0 6.0 6.1 6.0
6.2 6.0 6.0 7.0 6.1 6.0 6.0 6.0 6.1 6.0
6.2 6.1 6.0 6.0 6.2 6.0 6.0 6.0 6.0 7.2
To
6.5 6.3 6.1 6.2 6.1 6.1 6.1 6.1 6.1 6.1
6.5 6.3 6.1 6.1 6.1 6.1 6.1 6.1 6.1 6.1
6.4 6.2 6.1 6.1 6.2 6.1 6.1 6.1 6.1 6.1
Raspbian With Stress Test
6.7 6.2 6.0 6.0 6.0 6.0 6.1 6.0 6.1 6.0
6.5 6.2 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
6.4 6.2 6.0 6.0 6.0 6.0 6.0 6.1 6.0 6.0
To
6.3 6.2 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
6.3 6.2 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
6.3 6.2 6.0 6.0 6.1 6.0 6.0 6.0 6.0 6.0
64 Bit Gentoo On Demand
17.5 15.4 11.8 8.6 5.4 5.4 5.4 5.4 5.4 5.4
5.5 5.8 6.0 5.4 5.5 5.4 5.5 5.4 5.4 5.4
5.5 5.6 6.1 5.4 5.5 5.4 5.5 5.5 5.4 5.4
To
5.7 6.9 5.7 5.4 5.4 5.4 5.5 5.4 5.4 5.4
5.8 6.8 5.8 5.6 5.4 5.4 5.4 5.5 5.4 5.4
5.7 6.4 5.7 5.5 5.4 5.4 5.5 5.4 5.4 5.4
Gentoo With Stress Test
5.9 7.2 5.9 5.5 5.4 5.4 5.4 5.4 5.4 5.5
5.6 6.9 5.7 5.4 5.4 5.4 5.4 5.4 5.4 5.4
5.6 6.5 5.7 5.4 5.4 5.4 5.4 5.4 5.4 5.4
5.8 7.1 5.9 5.4 5.4 5.4 5.4 5.4 5.4 5.4
To
5.7 6.8 5.7 5.4 5.4 5.4 5.4 5.4 5.4 5.4
5.7 6.7 6.1 5.4 5.4 5.4 5.4 5.4 5.4 5.4
5.8 6.6 5.6 5.4 5.4 5.4 5.4 5.4 5.4 5.4
|
_________________ Regards
Roy |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Thu Feb 23, 2017 6:25 pm Post subject: |
|
|
roylongbottom,
All things are possible in Gentoo, its just missing the GUI, so you need to poke about a bit from the console.
Code: | cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors | will tell the available governors.
Code: | cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | will tell the governor in use.
Code: | echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | will set the performance governor, provided its available. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Thu Feb 23, 2017 7:12 pm Post subject: Re: 64 Bit Benchmarks |
|
|
roylongbottom wrote: | I haven’t found how to set “performance with Gentoo. Is it possible? |
roylongbottom ... to follow on from NeddySeagoon, there are a number of ways this can be set, sys-power/cpupower is used for this purpose, and is configured, and started, like any other service. However, assuming 'local' is in a runlevel (which it is by default) you could do the following:
/etc/local.d/cpufreq-performance.start: | #!/bin/sh
for i in /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_governor ; do
echo performance > "$i"
done |
You then 'chmod u+x /etc/local.d/cpufreq-performance.start' and this will be set on boot.
For other tuneables look under '/sys/devices/system/cpu/cpufreq/<governer>', and/or see /usr/src/linux/Documentation/cpu-freq/user-guide.txt.
HTH & best ... khay |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Thu Feb 23, 2017 8:09 pm Post subject: |
|
|
Heh, just like everything else in Gentoo, there are lots of ways to do everything and they are all equally right. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
mDup Apprentice
Joined: 14 Apr 2006 Posts: 212
|
Posted: Fri Feb 24, 2017 2:14 am Post subject: |
|
|
Does anyone have a prebuilt Firefox v51.0.1 arm64 gentoo package tarball?
I run gentoo on amlogic s905 and cannot build firefox, but then that's my own fault because I use gcc 6.3.0 for entire portage.
Nevertheless I can run prebuilt rpi3-64 Firefox v50.1.0 package, and so now I wonder if I can get an upgrade. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Fri Feb 24, 2017 8:50 am Post subject: |
|
|
mDup,
It won't build here. It looks like the build system is broken.
Code: | USE="${ARCH} egl gles1 icu minizip openssl pcre16 postproc python
qt5 script sqlite svc threads virt-network xvmc
-modemmanager -pam -skia"
# skia wants to link to neon stuff it doesn't build, in firefox anyway. |
Even with USE="-skia" it tries and fails to use skia.
I'm a gcc-6.3 on arm64 user too.
Code: | genlop -t firefox
Tue Jan 10 06:34:04 2017 >>> www-client/firefox-50.1.0-r1
merge time: 6 hours, 53 minutes and 21 seconds. | is the last one I have. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Fri Feb 24, 2017 11:55 am Post subject: |
|
|
NeddySeagoon wrote: | roylongbottom,
All things are possible in Gentoo, its just missing the GUI, so you need to poke about a bit from the console.
Code: | cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors | will tell the available governors.
Code: | cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | will tell the governor in use.
Code: | echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | will set the performance governor, provided its available. |
I had already tried those, but "echo performance" resulted in "Permission denied" and "sudo" made no difference. Trying "su" would not accept a password that I thought was "raspberrypi64". As recommended, I tried a bit more poking, and the command worked after first entering "sudo su" that produced a "pi64" red line prompt - je ne comprends pas and I don't know much French either. _________________ Regards
Roy |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Fri Feb 24, 2017 8:35 pm Post subject: |
|
|
roylongbottom wrote: | I had already tried those, but "echo performance" resulted in "Permission denied" and "sudo" made no difference. Trying "su" would not accept a password that I thought was "raspberrypi64". As recommended, I tried a bit more poking, and the command worked after first entering "sudo su" that produced a "pi64" red line prompt - je ne comprends pas and I don't know much French either. |
roylongbottom ... this is why the use of 'sudo' (as a magic bullet) is frowned upon by more experienced shell users, the expectation is that 'sudo echo foo > /foo' is going to work because prefaced with a magic word, but the shell doesn't interpret that command in the way that inexperienced shell users expect. It is the current shell which interprets the command, not a root shell:
Code: | % sudo echo foo > /foo
zsh: permission denied: /foo
% sudo "echo foo > /foo"
sudo: echo foo > /foo: command not found
% sudo "/bin/echo foo > /foo"
sudo: /bin/echo foo > /foo: command not found
% sudo sh -c "/bin/echo foo > /foo"
% ls -l /foo
-rw------- 1 root root 4 2017-02-24 21:23 /foo |
In the above you can see that it is only by running a shell via sudo that the 'command' (the full command that is) is run as superuser, and that 'command' needs protected by quotes (so as to be passed to the shell executing, and not interpreted by the running shell). This fact is a trap for the unwary. So, either invoke a shell, or use 'su -' to acquire one.
best ... khay |
|
Back to top |
|
|
mDup Apprentice
Joined: 14 Apr 2006 Posts: 212
|
Posted: Fri Feb 24, 2017 10:29 pm Post subject: |
|
|
NeddySeagoon wrote: | mDup,
It won't build here. It looks like the build system is broken.
Code: | USE="${ARCH} egl gles1 icu minizip openssl pcre16 postproc python
qt5 script sqlite svc threads virt-network xvmc
-modemmanager -pam -skia"
# skia wants to link to neon stuff it doesn't build, in firefox anyway. |
Even with USE="-skia" it tries and fails to use skia.
I'm a gcc-6.3 on arm64 user too.
Code: | genlop -t firefox
Tue Jan 10 06:34:04 2017 >>> www-client/firefox-50.1.0-r1
merge time: 6 hours, 53 minutes and 21 seconds. | is the last one I have. |
Thanks for information.
Have you been able then to build genpi64 firefox-50.1.0-r1 with gcc-6.3?
I get linker relocation errors, like:
Code: | ../../gfx/skia/SkBitmapProcState_matrixProcs.o: In function `SkBitmapProcState::chooseMatrixProc(bool)':
SkBitmapProcState_matrixProcs.cpp:(.text+0xa0c): undefined reference to `ClampX_ClampY_Procs_neon'
/usr/lib/gcc/aarch64-unknown-linux-gnu/6.3.0/../../../../aarch64-unknown-linux-gnu/bin/ld: ../../gfx/skia/SkBitmapProcState_matrixProcs.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol `ClampX_ClampY_Procs_neon' can not be used when making a shared object; recompile with -fPIC
SkBitmapProcState_matrixProcs.cpp:(.text+0xa10): undefined reference to `ClampX_ClampY_Procs_neon'
|
|
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Fri Feb 24, 2017 10:55 pm Post subject: |
|
|
mDup,
I've only tried the firefox in the tree. From your code fragment, is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64 _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Sat Feb 25, 2017 12:11 pm Post subject: OpenMP |
|
|
I am converting my MP benchmarks to run at 64 bits. Initially they are being successfully compiled and run via OpenSUSE using gcc-6. The multithreaded programs also run via Gentoo but not the OpenMP tests, where libgomp.so.1 is not found and the benchmarks can't be compiled using Gentoo gcc 5.4. Is OpenMP or the library available and, if so, how do I install them?
Future requirement is OpenGL, particularly equivalent of Raspberry Pi freeglut3. Is that available? I installed OpenGL 7.0 (I think) but can't find it. _________________ Regards
Roy |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Sat Feb 25, 2017 1:57 pm Post subject: |
|
|
roylongbottom,
equery can tell you lots of things about installed packages. to install it.
For example Code: | $ equery b openmp
* Searching for openmp ...
dev-libs/boost-1.63.0 (/usr/include/boost/numeric/odeint/external/openmp) |
libgomp.so.1 appears to belong to gcc. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
mDup Apprentice
Joined: 14 Apr 2006 Posts: 212
|
Posted: Sat Feb 25, 2017 2:46 pm Post subject: |
|
|
NeddySeagoon wrote: | mDup,
I've only tried the firefox in the tree. From your code fragment, is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64 |
Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo |
|
Back to top |
|
|
Sakaki Guru
Joined: 21 May 2014 Posts: 409
|
Posted: Sat Feb 25, 2017 11:25 pm Post subject: |
|
|
@mDup, @NeddySeagoon
mDup wrote: | NeddySeagoon wrote: | mDup,
I've only tried the firefox in the tree. From your code fragment, is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64 |
Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo |
The gentoo-on-rpi3-64bit image does have -skia set, and as you note I have retained firefox-50.1.0-r1 in the rpi3 overlay; more modern versions I could not get to build even with -skia. Thunderbird I haven't been able to get running reliably on arm64 at all (it builds, but segfaults shortly after starting up).
In case it is useful, the per-package USE flags on the image are as follows: Code: | pi64 package.use # tail -n 100 *
==> cairo <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
x11-libs/cairo opengl xlib-xcb
==> claws-mail <==
# requirements of mail-client/claws-mail
dev-libs/libdbusmenu gtk3
==> elogviewer <==
# requirements of app-portage/elogviewer
dev-libs/libpcre pcre16
==> ffmpeg <==
# enable Multi-Media Abstraction Layer (MMAL) decoding support
media-video/ffmpeg mmal
==> firefox <==
# no sneaky downloading of binary blobs on first run, please...
# and also disable skia; as this seems to try to pull in neon stuff
www-client/firefox -gmp-autoupdate -skia system-harfbuzz system-icu system-jpeg system-libevent system-libvpx
# requirements of firefox
dev-lang/python:2.7 sqlite
media-libs/harfbuzz icu
media-libs/libvpx postproc
==> genup <==
app-portage/genup::sakaki-tools -buildkernel
==> mesa <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
media-libs/mesa -classic xa xvmc
==> mplayer <==
media-video/mplayer -dvdnav
==> mpv <==
media-video/mpv -lua -luajit -iconv -uchardet
==> seahorse <==
# requirements of app-crypt/seahorse
app-crypt/pinentry gnome-keyring
==> vlc <==
media-video/vlc gnutls x264
==> xorg-server <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
x11-base/xorg-server glamor
==> zlib <==
# required by media-video/vlc
sys-libs/zlib minizip
==> zzz_via_autounmask <== | That is in addition to those in /etc/portage/make.conf:
Code: | # Additional USE flags in addition to those specified by the current profile.
USE="bindist -mudflap -sanitize"
USE="${USE} bluetooth egl gles1 gles2 lock thunar qt4 ffmpeg"
USE="${USE} -gnome -kde" | and of course by the default/linux/arm64/13.0/desktop profile.
Incidentally, all the packages used in the image are also available in binary form at my arm64 binhost, at https://www.isshoni.org/pi64.
@roylongbottom - khayyam's suggestion to use a ".start" file to set the performance governor on boot will work, but you need to be a little careful with this approach on the image, as there already is a .start file (/etc/local.d/ondemand_freq_scaling.start) in place to set the ondemand scaling. Be sure to move or delete this file if you are putting an alternative governor setting in place, otherwise the .start file that runs later during startup will "win" (and that will depend upon the lexical ordering of their filenames). _________________ Regards,
sakaki |
|
Back to top |
|
|
mDup Apprentice
Joined: 14 Apr 2006 Posts: 212
|
Posted: Sun Feb 26, 2017 4:23 am Post subject: |
|
|
Sakaki wrote: | @mDup, @NeddySeagoon
mDup wrote: | NeddySeagoon wrote: | mDup,
I've only tried the firefox in the tree. From your code fragment, is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64 |
Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo |
The gentoo-on-rpi3-64bit image does have -skia set, and as you note I have retained firefox-50.1.0-r1 in the rpi3 overlay; more modern versions I could not get to build even with -skia.[...]
In case it is useful, the per-package USE flags on the image are as follows:[...]
|
Thanks for the USE flags.
I do not have rpi3 (I have amlogic device) so I do not run your image and do not have your flags to look at readily.
Nice idea to use system- style flags for firefox. I'll adjust it on all my gentoo systems.
Yes, more recent would not get to build even with -skia. I think we are on same page. |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Sun Feb 26, 2017 4:51 am Post subject: |
|
|
Sakaki wrote: | @roylongbottom - khayyam's suggestion to use a ".start" file to set the performance governor on boot will work, but you need to be a little careful with this approach on the image, as there already is a .start file (/etc/local.d/ondemand_freq_scaling.start) in place to set the ondemand scaling. Be sure to move or delete this file if you are putting an alternative governor setting in place, otherwise the .start file that runs later during startup will "win" (and that will depend upon the lexical ordering of their filenames). |
Sakaki, roylongbottom, et al ... all you need do is 'chmod u-x' it, then it won't be run.
Code: | # chmod u-x /etc/local.d/ondemand_freq_scaling.start |
best ... khay |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Tue Mar 07, 2017 5:08 pm Post subject: MultiThreading Benchmarks |
|
|
Most of my multithreading benchmarks run using 1, 2, 4 and 8 threads. Many have tests that use approximately 12 KB. 120 KB and 12 MB, to use both caches and RAM. The first set attempt to measure maximum MFLOPS. with two test procedures, one with two floating point operations per data word and the other with 32. The latter includes a mixture of multiplications and additions, coded to enable SIMD operation. In this case, using single precision numbers, four at a time, plus linked multiply and add, a top end CPU can execute eight operations per clock cycle per core. It is not clear what the potential maximum MFLOPS is on an ARM Cortex-A53, but eight per core is mentioned. The same benchmark code obtained a maximum of 24 MFLOPS/MHz on a top end quad core Intel CPU, via Linux - see the following:
http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm#anchor6
Then this ARM CPU might need a different combination of arithmetic operations for higher values, where best case obtained with this benchmark was 2.2 MFLOPS/MHz using a single core.
Following shows the format of the MP-MFLOPS benchmarks with the best 64 bit Raspberry Pi 3 results. Note performance increases using more threads, except when limited by RAM speed. These benchmarks carry out a fixed number of test passes, with each thread carrying out the same calculations on different sections of data. Numeric results produced (x 100000) are output to show that all data has been used.
Code: | MP-MFLOPS NEON Intrinsics 64 Bit Tue Feb 28 15:37:39 2017
FPU Add & Multiply using 1, 2, 4 and 8 Threads
2 Ops/Word 32 Ops/Word
KB 12.8 128 12800 12.8 128 12800
MFLOPS
1T 697 725 420 2640 2544 2441
2T 1452 1420 348 5135 5258 4430
4T 1438 2679 343 10113 9905 5370
8T 1914 2533 358 9332 10124 6041
Results x 100000, 12345 indicates ERRORS
1T 76406 97075 99969 66015 95363 99951
2T 76406 97075 99969 66015 95363 99951
4T 76406 97075 99969 66015 95363 99951
8T 76406 97075 99969 66015 95363 99951
End of test Tue Feb 28 15:37:43 2017
|
Benchmarks appropriate for comparison of 32 and 64 bit versions are single and double precision versions, compiled for normal floating point and one using NEON intrinsic functions that are clearly suitable for SIMD operation and are converted to different types of vector operation.
64 bit/32 bit speed comparisons are below. Single precision MP-MFLOPS has the highest gain by using vector instructions, instead of scalar. With compiled intrinsics the systems use different forms of vector instructions.
Code: | Average 64 bit performance gains
2 Ops/Word 32 Ops/Word
12.8 128 12800 12.8 128 12800
MF SP 4.31 3.87 1.24 2.19 2.35 2.04
MF DP 2.45 1.71 0.83 1.92 1.92 1.42
Intrin 1.81 1.84 0.82 1.67 1.75 1.08
|
There is also an OpenMP benchmark that carries out the same calculations, but the OpenMP Shared Object file is not provided with Gentoo gcc. The other 64 bit Linux, I am testing, included it with gcc 4.8 and gcc-6. As usual benchmark, source codes, details and results are in:
http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm _________________ Regards
Roy |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Mar 16, 2017 11:03 am Post subject: More 64 Bit MultiThreading Benchmarks |
|
|
The other MP benchmarks, included in the tar.gz file, demonstrate some MP and 64 bit performance gains, with others identifying that multithreading provided little or no benefit and, sometimes, much worse performance.
MP-Whetstone - Multiple threads each run the eight test functions at the same time, but with some dedicated variables. MP performance is good but the simple test functions are nit appropriate for more advanced instructions at 64 bits, so relative 32 bit performance is between 0.48 and 2.08.
MP-Dhrystone - This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each thread but there are numerous other variables that are shared. The latter reduces performance gains via multiple threads and, in some cases, these can be slower than using a single thread. In this case, some quad core improvements are shown as up to 2.5 times faster than a single core. Single core 64 bit/32 bit speed ratio was 1.50 reducing to 1.10 using four threads.
MP-Linpack - The original Linpack Benchmark operates on double precision floating point 100x100 matrices. This one runs on 100x100, 500x500 and 1000x1000 single precision matrices using 0, 1, 2 and 4 separate threads, mainly via NEON intrinsic functions that are compiled into different forms of vector instructions. The benchmark was produced to demonstrate that the original Linpack code could not be converted (by me) to show increased performance using multiple threads. The official line is that users are allowed to implement their own linear equation solver for this purpose. At 100 x 100, data is in L2 cache, others depend more on RAM speed. The critical daxpy function is affected by numerous thread create and join directives, even on using one thread. This leads to slow and constant performance using all thread tests - see example below. The 32 bit version produced slightly slower speeds.
Code: | Linpack Single Precision MultiThreaded Benchmark
64 Bit NEON Intrinsics, Wed Mar 8 11:36:25 2017
MFLOPS 0 to 4 Threads, N 100, 500, 1000
Threads None 1 2 4
N 100 552.47 112.73 105.19 105.31
N 500 442.32 303.75 303.64 305.03
N 1000 353.88 315.96 309.15 308.31
|
MP-BusSpeed - This runs integer read only tests using caches and RAM, each thread accessing the same data, but with staggered starting points. It includes tests with variable address increments, to identify burst reading and bus speeds. The main “Read All” test is intended to identify maximum RAM speed. The benchmark demonstrated some appropriate MP performance gains, but slow 64 bit speeds, with the 32 bit version being 2.5 times faster via cache based data. The reason is that the latter compiled arithmetic as 16 four way NEON operations compared with 64 scalar instructions.
MP-RandMem - The benchmark has cache and RAM read only and read/write tests using sequential and random access, each thread accessing the same data but starting at different points. The read only L1 cache based tests demonstrated MP gains of 3.6 times and 64 bit version 43% faster than the 32 bit variety. Read/write tests produced no multithreading performance improvement and the latest benchmark appeared to be siomewhat slower than the 32 bit version. _________________ Regards
Roy |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3896 Location: Rasi, Finland
|
Posted: Tue Mar 21, 2017 9:49 am Post subject: |
|
|
Has anyone tried to convert the existing ext4 filesystem to btrfs?
I think the snapshotting feature of it could be useful there. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Sat Mar 25, 2017 11:23 am Post subject: OpenGL and Java Benchmarks |
|
|
OpenGL GLUT Benchmark
This was produced for use on Linux based PCs. It has four tests using coloured or textured simple objects then a wireframe and textured complex kitchen structure. It can be run from a script file specifying different window sizes and a command to disable VSYNC, enabling speeds greater than 60 FPS to be demonstrated. The benchmark, source code and details are in the following:
http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm#anchor19a
In 2012, I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework of the Unity desktop software. One reason probably was that a test can be run for extended periods as a stress test.
Below are results from a Raspberry Pi 3, using the experimental desktop GL driver and the new 64 bit version. It can be seen that, using smaller windows, the 32 bit version was much faster running simple coloured objects, with the 64 bit benchmark being ahead with complex structures. Then, performance was quite similar with full screen displays.
Code: | ######################### RPi 3 Original #########################
GLUT OpenGL Benchmark 32 Bit Version 1, Wed Jul 27 20:31:52 2016
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
320 240 308.4 182.1 82.6 52.3 21.6 13.7
640 480 129.5 119.6 74.6 49.2 21.6 13.8
1024 768 54.8 52.2 43.7 39.2 21.4 13.6
1920 1080 21.5 17.9 20.3 19.6 20.6 13.4
########################## RPi 3 Gentoo ##########################
GLUT OpenGL Benchmark 64 Bit Version 1, Sat Mar 18 18:21:44 2017
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
320 240 161.8 116.0 67.1 46.3 26.7 16.7
640 480 76.8 74.8 49.8 41.4 25.9 16.3
1024 768 35.7 34.8 29.7 26.7 25.0 15.7
1920 1080 18.0 18.7 16.4 15.8 17.1 13.1
|
Java Drawing and Whetstone Benchmarks
After a struggle, I gave up trying to emerge Java but managed to download Oracle JDK 1.8 for temporary use (not installed in the right place?). This could compile Java code and run the Whetstone program but not my JavaDraw benchmark. The benchmarks and results are can be obtained via the above links. On running the Whetstone benchmark, excluding two tests, where each was much faster, the average 64 bit speed was twice as fast. _________________ Regards
Roy |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54799 Location: 56N 3W
|
Posted: Sat Mar 25, 2017 3:12 pm Post subject: |
|
|
roylongbottom,
I haven't tried 32 bit Java for the Pi but you can build both Java 1.7 and once you have 1.7 you can use it to build 1.8.
If I got the keywording right, keywording is no longer required.
Its also possible to build Icedtea with Oracles Java. That's documented there too. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Tue Apr 25, 2017 10:17 am Post subject: 64 Bit I/O Benchmarks |
|
|
My DriveSpeed and LanSpeed programs have now been recompiled as DriveSpeed64 and LanSpeed64, with benchmarks, source codes, details and results in the tar.gz and htm files quoted earlier. The code for these is identical, except DriveSpeed opens files to use direct I/O, avoiding caching. LanSpeed normally runs without using local caching. The benchmarks measure writing and reading speeds of relatively large files, random access and numerous small files.
There might be tuning parameter, but DriveSpeed64 produced errors using the installed Gentoo operating system, where direct I/O did not appear to be available. The benchmarks was validated on a different 64 bit system.
DriveSpeed can also be used for testing USB connected drives. This produced errors using flash drives but happened to run testing a micro SD card, via a USB card reader, but only one via a btrfs formatted partition. Results are below but, compared with earlier 32 bit tests, some speeds are not as expected.
Code: | ################## DriveSpeed64 External SD Card ###################
Gentoo via USB, btrfs format
DriveSpeed RasPi 64 Bit 1.1 Tue Apr 4 10:28:11 2017
Selected File Path:
/run/media/demouser/ROOT/home/roy/benchmarks//
Total MB 29465, Free MB 27511, Used MB 1953
MBytes/Second
MB Write1 Write2 Write3 Read1 Read2 Read3
8 5.53 10.64 12.23 29.99 31.88 33.25
16 6.88 6.82 8.53 31.21 26.41 28.64
Cached
8 159.30 175.77 158.98 235.45 229.22 266.71
Random Read Write
From MB 4 8 16 4 8 16
msecs 0.016 0.006 0.006 20.67 50.55 22.84
200 Files Write Read Delete
File KB 4 8 16 4 8 16 secs
MB/sec 0.25 0.40 0.97 58.44 160.18 150.07
ms/file 16.09 20.66 16.87 0.07 0.05 0.11 0.160
Large Files > Performance restricted by USB speed
Random > Writing exceptionally slow, reading far too fast, data cached?
Small Files > Writing exceptionally slow, reading far too fast, data cached?
|
As Samba for Gentoo was initially said to be not tested at 64 bits, the LAN was not available to run the benchmark on this system, but was compiled and run on another 64 bit configuration, accessing a Windows based PC. Results are below,
Code: | ######################### LanSpeed64 Example #######################
LanSpeed RasPi 64 Bit 1.0 Tue Apr 4 13:04:06 2017
Selected File Path:
/root/Desktop/sharepc/
Total MB 266240, Free MB 134653, Used MB 131587
MBytes/Second
MB Write1 Write2 Write3 Read1 Read2 Read3
8 11.23 11.40 11.40 8.10 11.62 11.64
16 11.27 11.42 11.44 11.66 11.66 11.64
Random Read Write
From MB 4 8 16 4 8 16
msecs 0.724 0.886 1.333 1.58 1.50 1.37
200 Files Write Read Delete
File KB 4 8 16 4 8 16 secs
MB/sec 0.99 1.81 2.73 1.77 3.02 4.50
ms/file 4.13 4.54 6.01 2.32 2.71 3.64 0.201
End of test Tue Apr 4 13:04:43 2017
>>>>>>>>>>>>> Comparison with 32 Bit Version Rpi 3 Ph Win <<<<<<<<<<<<<
Large Files > Similar speeds reflecting 100 Mbps
Random > Similar but writing faster, no apparent caching
Small Files > Similar speeds
|
LanSpeed64 was also successfully run targeting the main and USB drives that would not run DriveSpeed64, identifying speeds when data was cached, and suggesting that the earlier failures were due to trying to open files (as used in the programs) to force direct I/O. Details are available in the aforementioned htm report. _________________ Regards
Roy |
|
Back to top |
|
|
roylongbottom n00b
Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Mon May 08, 2017 11:40 am Post subject: Stress Testing Programs |
|
|
Stress Testing Programs
The Cortex-A53 CPU, used in the Raspberry Pi 3, is known to be subject to overheating. Assuming correct software implementation, the first noticeable effect is that, as the temperature increases beyond a critical point, the CPU MHz is throttled. At normal room temperatures, this might only occur when all CPU cores are executing at higher speeds, with a possible contribution from graphics activity. When considered important, special cooling arrangements might be needed, where these stress tests will be of use to evaluate different arrangements. For this series of procedures, the RPI 3 board was “out of case”, where recorded temperatures are often shown to be lower than those obtained using a standard plastic enclosure.
A main consideration for stress testing is that programs have parameters to run for defined durations but with short term reports on progress, including performance and, in this case, CPU temperature and clock MHz. These details should also be saved in constantly updated log files. Then, there will be some evidence, if the system crashes.
In this case, multiple programs are run using a different terminal window for each, normally with 15 minutes test duration specified. One of these measures CPU temperature MHz at specified intervals, where vcgencmd function has to be installed (as used by Raspbian). Two of the programs are benchmarks, already reported on, but with alternative run time parameters, and two are new ones, now with programs, source code and detailed results included in:
http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm
The oldest uses linverloopsPi64, Livermore Loops Benchmark, that has 24 different test kernels, repeated three time with different (cache) memory demands. This was known to produce wrong numeric answers on an overclocked PC. For reliability testing, a parameter specifies a standard time for each of the loops. Results are displayed during the tests, but performance reported and logged is at the end. The main benefit is a continuously changing processing profile.
The other existing benchmark is videogl64 via OpenGL. This has six tests procedures, where one is chosen besides the number of passes and duration of each. The window width and height can be specified, allowing visible screen space where other terminal windows can be displayed.
The new tests have a run time parameter to specify the amount of cache or memory space to use, and carry out high speed integer and floating point calculations via stressintPi64 and burninfpuPi64.
A summary of test results is;
Integer Arithmetic Stress Test - comprising four runs of stressintPi64 using 40 KB of data, aimed at all using L2 cache, with 12 tests each running for 80 seconds. Performance on all cores was essentially the same, with CPU throttling starting after 30 seconds, eventually reducing CPU MHz by nearly 32%, with maximum recorded sample CPU temperature of 84.4 °C. Compared with stand alone results, CPU performance was degraded to a greater extent due to MP overheads.
Floating Point Arithmetic Stress Test - having four burninfpuPi64 test procedures, using L2 cache with 8 operations per data word. Again performance was effectively constant from all cores, with maximum total throughput of 13.7 GFLOPS, reducing by nearly 4 GFLOPS due to CPU throttling down to 843 MHz, again with a maximum temperature of 84.4 °C.
Livermore Loops Stress Test - This uses four copies of the Livermore Loops Benchmark. Overall MFLOPS speeds are shown to be significantly degraded, but RPiHeatMHz64 results demonstrate inconsistent effects of different arithmetic functions. Maximum temperature recorded was 84.9 °C with a CPU MHz of 744.
Integer and OpenGL Stress Tests - The most complicated OpenGL kitchen test was used, along with three Integer Stress Tests., this time using L1 cache based data. The same procedures were used with CPU MHz settings of On-demand and Performance, where results are shown to be virtually the same. The first summary of speeds and temperatures below is with the Performance setting. Then, OpenGL FPS and integer MB/second reduced to around 60% of initial speeds, with many temperatures of 84.9 °C recorded, when CPU MHz temporarily dropped to half speed at 600 MHz. The tests were repeated with the system in a FLIRC case, where the whole aluminium case becomes the heatsink. The performance was consistently high, but temperatures approached the critical CPU throttling would occur.
Code: | Performance out of case Performance FLIRC case
Total OGL CPU CPU Total OGL CPU CPU
Secs MB/s FPS MHz 'C Secs MB/s FPS MHz 'C
0 1200 55.8 0 1200 44.0
30 13 1107 80.6 30 13 1200 60.1
60 11 910 82.7 60 13 1199 63.4
80 6064 9 850 83.8 80 7116 13 1200 65.0
160 4656 9 744 84.9 160 7041 13 1199 68.8
240 4305 8 600 82.7 240 7072 13 1200 70.9
320 4217 8 600 82.7 320 7075 13 1200 72.0
400 4209 8 738 84.9 400 7095 13 1200 74.1
480 4209 8 600 82.7 480 7081 13 1200 75.8
560 4802 8 738 84.9 560 8067 13 1200 74.7
640 4768 8 722 84.9 640 8092 13 1200 76.8
720 4730 8 743 84.9 720 7989 13 1200 77.4
800 4664 8 823 84.9 800 8050 13 1200 78.4
880 4712 8 719 84.9 880 7984 13 1200 79.5
960 5917 8 938 82.7 960 8344 13 1200 74.1
|
These are the last of my current benchmarks and test programs for Raspberry Pi 3. _________________ Regards
Roy |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|