During last years lot of work went into design of SIMD instructions for different cpus. X86-64 has some, Power has and so does AArch64.
But why I am writing about it? Simple — build failures. Or rather test failures. Especially in scientific software like HepMC or alglib. They build fine but that’s all.
The difference was small — something like e-15 or smaller but it was enough to make tests fail. And what to blame? SIMD of course.
AArch64 has FMA (fused multiply-add) instructions which speed up calculations but result is more precise than traditional way when all operations are done one by one. This is enough for tests to fail 🙁
But there is solution for it. To degrade you need to add “-ffp-context=off” to compiler flags. This disables use of FMA so results of tests are same as on x86-64 (on pre-Haswell/Bulldozer cores). As a bonus it works on powerpc64(le) and s390(x) too.
Thanks goes to David Abdurachmanov and Andrew Pinski for finding out which exactly flag needs to be used (instead of
-fno-expensive-optimisations I used before).