From a diary of AArch64 porter — vfp precision

This post is part 6 of the "From a diary of AArch64 porter" series:

  1. From a diary of AArch64 porter — autoconf
  2. From a diary of AArch64 porter — rpm packaging
  3. From a diary of AArch64 porter — testsuites
  4. From a diary of AArch64 porter — POSIX.1 functionality
  5. From a diary of AArch64 porter — PAGE_SIZE
  6. From a diary of AArch64 porter — vfp precision
  7. From a diary of AArch64 porter — system calls
  8. From a diary of AArch64 porter — parallel builds
  9. From a diary of AArch64 porter — firefighting
  10. From a diary of AArch64 porter — drive-by coding
  11. From a diary of AArch64 porter — manylinux2014
  12. From a diary of AArch64 porter — handling big patches
  13. From a diary of AArch64 porter — Arm CPU features table

During last years lot of work went into design of SIMD instructions for different cpus. X86-64 has some, Power has and so does AArch64.

But why I am writing about it? Simple — build failures. Or rather test failures. Especially in scientific software like HepMC or alglib. They build fine but that’s all.

The difference was small — something like e-15 or smaller but it was enough to make tests fail. And what to blame? SIMD of course.

AArch64 has FMA (fused multiply-add) instructions which speed up calculations but result is more precise than traditional way when all operations are done one by one. This is enough for tests to fail :-(

But there is solution for it. To degrade you need to add “-ffp-context=off” to compiler flags. This disables use of FMA so results of tests are same as on x86-64 (on pre-Haswell/Bulldozer cores). As a bonus it works on powerpc64(le) and s390(x) too.

Thanks goes to David Abdurachmanov and Andrew Pinski for finding out which exactly flag needs to be used (instead of -fno-expensive-optimisations I used before).

aarch64 development fedora ubuntu