Today I was fighting with Nova. No idea who won…

I am working on getting OpenStack running on AArch64 architecture, right? So recently I went from “just” building images to also using them to deploy working “cloud” setup. And that resulted in new sets of patches, updates to patches, discussions…

OpenStack is supposed to make virtualization easier. Create accounts, give access to users and they will make virtual machines and use them without worrying what kind of hardware is below etc. But first you have to get it working. So this week instead of only taking care of Kolla and Kolla-ansible projects I also patched Nova. The component responsible for running virtual machines.

One patch was simple edit of existing one to make it comply with all comments. Took some time anyway as I had to write some proper bug description to make sure that reviewers will know why it is so important for us. And once merged we will have UEFI used as default boot method on AArch64. Without any play with hw_firmware_type=uefi property on images (which is easy to forget). But this was the easy one…

Imagine that you have a rack of random AArch64 hardware and want to run a “cloud”. You may end in a situation where you have a mix of servers for compute nodes (the ones where VM instances run). In Nova/libvirt it is handled by cpu_mode option:

It is also possible to request the host CPU model in two ways:

“host-model” - this causes libvirt to identify the named CPU model which most closely matches the host from the above list, and then request additional CPU flags to complete the match. This should give close to maximum functionality/performance, which maintaining good reliability/compatibility if the guest is migrated to another host with slightly different host CPUs. Beware, due to the way libvirt detects host CPU, CPU configuration created using host-model may not work as expected. The guest CPU may confuse guest OS (i.e. even cause a kernel panic) by using a combination of CPU features and other parameters (such as CPUID level) that don’t work.

“host-passthrough” - this causes libvirt to tell KVM to passthrough the host CPU with no modifications. The difference to host-model, instead of just matching feature flags, every last detail of the host CPU is matched. This gives absolutely best performance, and can be important to some apps which check low level CPU details, but it comes at a cost wrt migration. The guest can only be migrated to an exactly matching host CPU.

Nova assumes host-model when KVM/QEMU is used as hypervisor. And crashes terribly on AArch64 with:

libvirtError: unsupported configuration: CPU mode ‘host-model’ for aarch64 kvm domain on aarch64 host is not supported by hypervisor

Not nice, right? So I made a simple patch to get host-passthrough to be default on AArch64. But when something is so simple then it’s description is probably not so simple…

Reported bug on nova with some logs attached. Then digged for some information which would explain issue better. Found Ubuntu’s bug on libvirt from Ocata times. They used same workaround.

So I thought: let’s report a bug for libvirt and request support for host-model option. There I got link to an another bug in libvirt with set of information why it does not make sense.

The reason is simple. No one knows what you run on when you run Linux on AArch64 server. In theory there are fields in /proc/cpuinfo but still you do not know do cpu cores in compute01 are same as compute02 servers. At least from nova/libvirt/qemu perspective. This also blocks us from setting cpu_mode to custom and selecting cpu_model which could be some way of getting same cpu for each instance despite of types of compute node processor cores.

The good side is that VM instances will work. The problem may appear when you migrate VM to cpu with other core — it may work. Or may not. Good luck!

Related posts