This post is part 6 of the "From the diary of AArch64 porter" series:
- From the diary of AArch64 porter — autoconf
- From the diary of AArch64 porter — rpm packaging
- From the diary of AArch64 porter — testsuites
- From the diary of AArch64 porter — POSIX.1 functionality
- From the diary of AArch64 porter — PAGE_SIZE
- From the diary of AArch64 porter — vfp precision
- From the diary of AArch64 porter — system calls
- From the diary of AArch64 porter — parallel builds
- From the diary of AArch64 porter — firefighting
- From the diary of AArch64 porter — drive-by coding
- From the diary of AArch64 porter — manylinux2014
- From the diary of AArch64 porter — handling big patches
- From the diary of AArch64 porter — Arm CPU features table
During last years lot of work went into design of SIMD instructions for different cpus. X86-64 has some, Power has and so does AArch64.
But why I am writing about it? Simple — build failures. Or rather test failures. Especially in scientific software like HepMC or alglib. They build fine but that’s all.
The difference was small — something like e-15 or smaller but it was enough to make tests fail. And what to blame? SIMD of course.
AArch64 has FMA (fused multiply-add) instructions which speed up calculations but result is more precise than traditional way when all operations are done one by one. This is enough for tests to fail :-(
But there is solution for it. To degrade you need to add “-ffp-context=off” to compiler flags. This disables use of FMA so results of tests are same as on x86-64 (on pre-Haswell/Bulldozer cores). As a bonus it works on powerpc64(le) and s390(x) too.
Thanks goes to David Abdurachmanov and Andrew Pinski for finding out which exactly flag needs to be used (instead of -fno-expensive-optimisations
I used before).