From the diary of AArch64 porter — vfp precision

This post is part 6 of the "From the diary of AArch64 porter" series:

During last years lot of work went into design of SIMD instructions for different cpus. X86-64 has some, Power has and so does AArch64.

But why I am writing about it? Simple — build failures. Or rather test failures. Especially in scientific software like HepMC or alglib. They build fine but that’s all.

The difference was small — something like e-15 or smaller but it was enough to make tests fail. And what to blame? SIMD of course.

AArch64 has FMA (fused multiply-add) instructions which speed up calculations but result is more precise than traditional way when all operations are done one by one. This is enough for tests to fail :-(

But there is solution for it. To degrade you need to add “-ffp-context=off” to compiler flags. This disables use of FMA so results of tests are same as on x86-64 (on pre-Haswell/Bulldozer cores). As a bonus it works on powerpc64(le) and s390(x) too.

Thanks goes to David Abdurachmanov and Andrew Pinski for finding out which exactly flag needs to be used (instead of -fno-expensive-optimisations I used before).

Related posts