This post is part 13 of the "From a diary of AArch64 porter" series:
- From a diary of AArch64 porter — autoconf
- From a diary of AArch64 porter — rpm packaging
- From a diary of AArch64 porter — testsuites
- From a diary of AArch64 porter — POSIX.1 functionality
- From a diary of AArch64 porter — PAGE_SIZE
- From a diary of AArch64 porter — vfp precision
- From a diary of AArch64 porter — system calls
- From a diary of AArch64 porter — parallel builds
- From a diary of AArch64 porter — firefighting
- From a diary of AArch64 porter — drive-by coding
- From a diary of AArch64 porter — manylinux2014
- From a diary of AArch64 porter — handling big patches
- From a diary of AArch64 porter — Arm CPU features table
Last week I had some discussions about future and projects where I am involved. And as kind of break I started yet another personal project for fun…
AArch64 SoC features table
Let make a table showing which AArch64 SoCs support which processor features. And how bad situation is.
Source of data
Under Linux system there is that
/proc/cpuinfo file describing processor, cpu
cores implementer, version of them and features they support:
processor : 0
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd05
CPU revision : 0
Most of AArch64 systems I used have much shorter list so I started wondering which SoC has which features listed.
Took some code from my Linux system calls table and started with cpuinfo dumps from my systems. Later added some entries from the Internet. Then a bunch of mobile phones.
First version of AArch64 SoC features table was created.
Let check Linux
Once I got QEMU “max” and Apple M2 cpuinfo dumps I thought that I have all features I need. Oh, how wrong I was…
I went to Linux kernel sources and gathered a list of all supported entries. Now lot of horizontal scrolling is needed as table lists 66 cpu features.
How to help
At this moment AArch64 SoC features table lists 29 systems. More is needed to make it useful.
You can help — submit an issue on github and let me take care of rest. Needed information:
- SoC name (like “RK3399”, “M1 Max” etc.)
- SoC vendor name (“Rockchip”, “Apple” etc.)
- product name (not used in table)
- url to product page (also not used in table)
Product name/url can be used later and to check details.
There are many features recognized by Linux kernel. Let me try to group them by architecture level.
The base of all AArch64 systems and the most popular one as it is present in far too many SBC devices. Just few features present:
|floating point present
|advanced SIMD present
|timer event stream generation
|CPU features can be read
Then go cryptographic extensions (optional):
|AESD and AESE instructions
|PMULL, PMULL2 instructions
I was surprised seeing just two entries for v8.1 being present:
|Advanced SIMD rounding double multiply accumulate instructions
8.2 version of Arm architecture was quite a refresh. New cpu cores announced, SVE (Scalable Vector Extension) was defined and several other calculation extensions. Note that most of them are optional (as usual).
|Advanced SIMD dot product instructions
|Floating-point half-precision multiplication instructions
|Advanced SIMD with BFloat16 instructions
|AArch64 BFloat16 instructions
|DC CVADP instruction
|DC CVAP instruction
|Flag manipulation instructions v2
|Half-precision floating-point data processing
|AArch64 Int8 matrix multiplication instructions
|Advanced SIMD SHA3 instructions
|Advanced SIMD SHA512 instructions
|Advanced SIMD SM3 instructions
|Advanced SIMD SM4 instructions
|Scalable Vector Extension
|AArch64 BFloat16 instructions (SVE)
|Single-precision Matrix Multiplication (SVE)
|Double-precision Matrix Multiplication (SVE)
|AArch64 Int8 matrix multiplication instructions (SVE)
|Unaligned single-copy atomicity and atomic functions with a 16-byte address range aligned to 16-bytes are supported
I heard that v8.3 was “interesting experiment which needed fixing”…
|Floating-point complex number instructions
|Load-Acquire RCpc instructions
v8.4 got some fixes for v8.3 features. Also pointer authentication stuff was defined.
It is also lowest level for (optional) nested virtualization (which was added in v8.3 but needed improvements).
|Data Independent Timing instructions
|Load-Acquire RCpc instructions v2
|Faulting on AUT* instructions
|Enhanced pointer authentication functionality
v8.5 was the time of Spectre, Meldown etc. vulnerabilities and fixes for them. Several lower cores implemented those too.
Some interesting security features are BTI and MTE.
|Branch Target Identification
|Enhancements to flag manipulation instructions
|Floating-point to integer instructions
|Memory Tagging Extension
|MTE Asymmetric Fault Handling
|Random number generator
This is so far into “Arm fairy tales” that I do not know what to write here.
|Enhanced Counter Virtualization
Like above. And “afp” sounds scary…
|Alternate floating-point behaviour
|Increased precision of Reciprocal Estimate and Reciprocal Square Root Estimate
|WFE and WFI instructions with timeout
This is a fork of Arm v8.5 with SVE2 on top. I think that it was an attempt to have a new start as lot of SoCs still used old cores.
Also “Arm v9” gives marketing boost ;D
|Scalable Vector Extension version 2
|Scalable Vector AES instructions
|Scalable Vector Bit Permutes instruction
|Scalable Vector PMULL instructions
|Scalable Vector SHA3 instructions
|Scalable Vector SM4 instructions
“v9.2 is the new v8.7” could be a marketing slogan.
|AArch64 Extended BFloat16 instructions
|Scalable Matrix Extension
|SME support for instructions that accumulate BFloat16 outer products into FP32 single-precision floating-point tiles
|SME support for instructions that accumulate FP16 half-precision floating-point outer products into FP32 single-precision floating-point tiles
|SME support for instructions that accumulate FP32 single-precision floating-point outer products into single-precision floating-point tiles
|SME support for instructions that accumulate into FP64 double-precision floating-point elements in the ZA array
|Full Streaming SVE mode instructions
|SME support for instructions that accumulate 8-bit integer outer products into 32-bit integer tiles
|SME support for instructions that accumulate into 64-bit integer elements in the ZA array
|AArch64 Extended BFloat16 instructions (SVE)
Will table be useful?
I hope that AArch64 SoC features table will be useful. Once populated with data it will allow to see which SoC has features we want to target. Of course there are some problems:
- is there a hardware with those features at all
- will we live long enough to see such hardware