You run VM instance. Nevermind is it part of OpenStack setup or just local one started using Boxes, virt-manager, virsh or other that kind of fronted to libvirt daemon. And then you want to add some virtual hardware to it. And another card and one more controller…
Easy to imagine scenario, right? What can go wrong, you say? “No more available PCI slots.” message can happen. On second/third card/controller… But how? Why?
Like I wrote in one of my previous posts most of VM instances are 90s pc hardware virtual boxes. With simple PCI bus which accepts several cards to be added/removed at any moment.
But not on AArch64 architecture. Nor on x86-64 with Q35 machine type. What is a difference? Both are PCI Express machines. And by default they have far too small amount of pcie slots (called
pcie-root-port in qemu/libvirt language). More about PCI Express support can be found in PCI topology and hotplug page of libvirt documentation.
So I wrote a patch to Nova to make sure that enough slots will be available. And then started testing. Tried few different approaches, discussed with upstream libvirt developers about ways of solving the problem and finally we selected the one and only proper way of doing it. Then discussed failures with UEFI developers. And went for help to Qemu authors. And explained what I want to achieve and why to everyone in each of those four projects. At some point I had seen
pcie-root-port things everywhere…
Turned out that the method of fixing it is kind of simple: we have to create whole pcie structure with root port and slots. This tells libvirt to not try any automatic adding of slots (which may be tricky if not configured properly as you may end with too small amount of slots for basic addons).
Then I went with idea of using insane values. VM with one hundred PCIe slots? Sure. So I made one, booted it and then something weird happen: landed in UEFI shell instead of getting system booted. Why? How? Where is my storage? Network? Etc?
Turns out that Qemu has limits. And libvirt has limits… All ports/slots went into one bus and memory for MMCONFIG and/or I/O space was gone. There are two interesting threads about it on qemu-devel mailing list.
So I added magic number into my patch: 28 — this amount of
pcie-root-port entries in my aarch64 VM instance was giving me bootable system. Have to check it on x86-64/q35 setup still but it should be more or less the same. I expect this patch to land in ‘Rocky’ (the next OpenStack release) and probably will have to find a way to get it into ‘Queens’ as well because this is what we are planning to use for next edition of Linaro Developer Cloud.
Conclusion? Hotplug may be complicated. But issues with it can be solved.