OpenStack ‘Queens’ release done

OpenStack community released ‘queens’ version this week. IMHO it is quite important moment for AArch64 community as well because it works out of the box for us.

Gone are things like setting hw_firmware_type=uefi for each image you upload to Glance — Nova assumes UEFI to be the default firmware on AArch64 (unless you set the variable to different value for some reason). This simplifies things as users does not have to worry about and we should have less support questions on new setups of Linaro Developer Cloud (which will be based on ‘Queens’ instead of ‘Newton’).

There is a working graphical console if your guest image uses properly configured kernel (4.14 from Debian/stretch-backports works fine, 4.4 from Ubuntu/xenial (used by CirrOS) does not have graphics enabled). Handy feature which we were asked already by some users.

Sad thing is state of live migration on AArch64. It simply does not work through the whole stack (Nova, libvirt, QEMU) because we have no idea what exactly cpu we are running on and how it is compatible with other cpu cores. In theory live migration between same type of processors (like XGene1 -> XGene1) should be possible but we do not have even that level of information available. More information can be found in bug 1430987 reported against libvirt.

Less sad part? We set cpu_model to ‘host-passthrough’ by default now (in Nova) so nevermind which deployment method is used it should work out of the box.

When it comes to building (Kolla) and deploying (Kolla Ansible) most of the changes were done during Pike cycle. During Queens’ one most of the changes were small tweaks here and there. I think that our biggest change was convincing everyone in Kolla(-ansible) to migrate from MariaDB 10.0.x (usually from external repositories) to 10.1.x taken from distribution (Debian) or from RDO.

What will Rocky bring? Better hotplug for PCI Express machines (AArch64/virt, x86/q35 models) is one thing. I hope that live migration stuff situation will improve as well.

Hotplug in VM. Easy to say…

You run VM instance. Nevermind is it part of OpenStack setup or just local one started using Boxes, virt-manager, virsh or other that kind of fronted to libvirt daemon. And then you want to add some virtual hardware to it. And another card and one more controller…

Easy to imagine scenario, right? What can go wrong, you say? “No more available PCI slots.” message can happen. On second/third card/controller… But how? Why?

Like I wrote in one of my previous posts most of VM instances are 90s pc hardware virtual boxes. With simple PCI bus which accepts several cards to be added/removed at any moment.

But not on AArch64 architecture. Nor on x86-64 with Q35 machine type. What is a difference? Both are PCI Express machines. And by default they have far too small amount of pcie slots (called pcie-root-port in qemu/libvirt language). More about PCI Express support can be found in PCI topology and hotplug page of libvirt documentation.

So I wrote a patch to Nova to make sure that enough slots will be available. And then started testing. Tried few different approaches, discussed with upstream libvirt developers about ways of solving the problem and finally we selected the one and only proper way of doing it. Then discussed failures with UEFI developers. And went for help to Qemu authors. And explained what I want to achieve and why to everyone in each of those four projects. At some point I had seen pcie-root-port things everywhere…

Turned out that the method of fixing it is kind of simple: we have to create whole pcie structure with root port and slots. This tells libvirt to not try any automatic adding of slots (which may be tricky if not configured properly as you may end with too small amount of slots for basic addons).

Then I went with idea of using insane values. VM with one hundred PCIe slots? Sure. So I made one, booted it and then something weird happen: landed in UEFI shell instead of getting system booted. Why? How? Where is my storage? Network? Etc?

Turns out that Qemu has limits. And libvirt has limits… All ports/slots went into one bus and memory for MMCONFIG and/or I/O space was gone. There are two interesting threads about it on qemu-devel mailing list.

So I added magic number into my patch: 28 — this amount of pcie-root-port entries in my aarch64 VM instance was giving me bootable system. Have to check it on x86-64/q35 setup still but it should be more or less the same. I expect this patch to land in ‘Rocky’ (the next OpenStack release) and probably will have to find a way to get it into ‘Queens’ as well because this is what we are planning to use for next edition of Linaro Developer Cloud.

Conclusion? Hotplug may be complicated. But issues with it can be solved.

Graphical console in OpenStack/aarch64

OpenStack users are used to have graphical console available. They take it for granted even. But we lacked it…

When we started working on OpenStack on 64-bit ARM there were many things missing. Most of them got sorted out already. One thing was still in a queue: graphical console. So two weeks ago I started looking at the issue.

Whenever someone tried to use it Nova reported one simple message: “No free USB ports.” You can ask what it has to console? I thought similar and started digging…

As usual reason was simple: yet another aarch64<>x86 difference in libvirt. Turned out that arm64 is one of those few architectures which do not have USB host controller in default setup. When Nova is told to provide graphical console it adds Spice (or VNC) graphics, video card and USB tablet. But in our case VM instance does not have any USB ports so VM start fails with “No free USB ports” message.

Solution was simple: let’s add USB host controller into VM instance. But where? Should libvirt do that or should Nova? I discussed it with libvirt developers and got mixed answers. Opened a bug for it and went to Nova hacking.

Turned out that Nova code for building guest configuration is not that complicated. I created a patch to add USB host controller and waited for code reviews. There were many suggestions, opinions and questions. So I rewrote code. And then again. And again. Finally 15th version got “looks good” opinion from all reviewers and got merged.

And how result looks? Boring as it should:

OpenStack graphical console on aarch64

Everyone loves 90s PC hardware?

Do you know what is the most popular PC machine nowadays? It is “simple” PCI based x86(-64) machine with i440fx chipset and some expansion cards. Everyone is using them daily. Often you do not even realise that things you do online are handled by machines from 90s.

Sounds like a heresy? Who would use 90s hardware in modern world? PCI cards were supposed to be replaced by PCI Express etc, right? No one uses USB 1.x host controllers, hard to find PS/2 mouse or keyboard in stores etc. And you all are wrong…

Most of virtual machines in x86 world is a weird mix of 90s hardware with a bunch of PCI cards to keep them usable in modern world. Parallel ATA storage went to trash replaced by SCSI/SATA/virtual ones. Graphic card is usually simple framebuffer without any 3D acceleration (so like 90s) and typical PS/2 input devices connected. And you have USB 1.1 and 2.0 controllers with one tablet connected. Sounds like retro machines my friends prepare for retro events.

You can upgrade to USB 3.0 controller, graphic card with some 3D acceleration. Or add more memory and cpu cores that i440fx based PC owner ever dreamt about. But it is still 90s hardware.

Want to have something more modern? You can migrate to PCI Express. But nearly no one does that in x86 world. And in AArch64 world we start from here.

And that’s the reason why working with developers of projects related to virtualization (qemu, libvirt, openstack) can be frustrating.

Hotplug issues? Which hotplug issues? My VM instance allows to plug 10 cards while it is running so where is a problem? The problem is that your machine is 90s hardware with simple PCI bus and 31 slots present on virtual mainboard while VM instance with PCI Express (aarch64, x86 with q35 model) has only TWO free slots present on motherboard. And once they are used no new slots arrive. Unless you shutdown, add free slots and power up again.

Or my recent stuff: adding USB host controller. x86 has it because someone ‘made a mistake’ in past and enabled it. But other architectures got sorted out and did not get it. Now all of them need to have it added when it is needed. Have a patch to Nova for it and I am rewriting it again and again to get it into acceptable form.

Partitioning is fun too. There are so many people who fear switching to GPT…

YADIBP

Let me introduce new awesome project: YADIBP. It is cool, foss, awesome, the one and only and invented here instead of there. And it does exactly what it has to do and in a way it has to be done. Truly cool and awesome.

Using that tool you can build disk images with several supported Linux distributions. Or maybe even with any BSD distribution. And Haiku or ReactOS. Patches for AmigaOS exist too!

Any architecture. Starting from 128 bit wide RUSC-VI to antique architectures like ia32 or m88k as long as you have either hardware or qemu port (patches for ARM fast models in progress).

Just fetch from git and use. Written in BASIC so it should work everywhere. And if you lack BASIC interpreter then you can run it as Python or Erlang. Our developers are so cool and awesome!

But let’s get back to reality — there are gazillions of projects of tool which does one simple thing: builds a disk image. And gazillion will be still written because some people have that “Not Invented Here” syndrome.

And I am getting tired of it.

Today I was fighting with Nova. No idea who won…

I am working on getting OpenStack running on AArch64 architecture, right? So recently I went from “just” building images to also using them to deploy working “cloud” setup. And that resulted in new sets of patches, updates to patches, discussions…

OpenStack is supposed to make virtualization easier. Create accounts, give access to users and they will make virtual machines and use them without worrying what kind of hardware is below etc. But first you have to get it working. So this week instead of only taking care of Kolla and Kolla-ansible projects I also patched Nova. The component responsible for running virtual machines.

One patch was simple edit of existing one to make it comply with all comments. Took some time anyway as I had to write some proper bug description to make sure that reviewers will know why it is so important for us. And once merged we will have UEFI used as default boot method on AArch64. Without any play with hw_firmware_type=uefi property on images (which is easy to forget). But this was the easy one…

Imagine that you have a rack of random AArch64 hardware and want to run a “cloud”. You may end in a situation where you have a mix of servers for compute nodes (the ones where VM instances run). In Nova/libvirt it is handled by cpu_mode option:

It is also possible to request the host CPU model in two ways:

  • “host-model” – this causes libvirt to identify the named CPU model which most closely matches the host from the above list, and then request additional CPU flags to complete the match. This should give close to maximum functionality/performance, which maintaining good reliability/compatibility if the guest is migrated to another host with slightly different host CPUs. Beware, due to the way libvirt detects host CPU, CPU configuration created using host-model may not work as expected. The guest CPU may confuse guest OS (i.e. even cause a kernel panic) by using a combination of CPU features and other parameters (such as CPUID level) that don’t work.

  • “host-passthrough” – this causes libvirt to tell KVM to passthrough the host CPU with no modifications. The difference to host-model, instead of just matching feature flags, every last detail of the host CPU is matched. This gives absolutely best performance, and can be important to some apps which check low level CPU details, but it comes at a cost wrt migration. The guest can only be migrated to an exactly matching host CPU.

Nova assumes host-model when KVM/QEMU is used as hypervisor. And crashes terribly on AArch64 with:

libvirtError: unsupported configuration: CPU mode ‘host-model’ for aarch64 kvm domain on aarch64 host is not supported by hypervisor

Not nice, right? So I made a simple patch to get host-passthrough to be default on AArch64. But when something is so simple then it’s description is probably not so simple…

Reported bug on nova with some logs attached. Then digged for some information which would explain issue better. Found Ubuntu’s bug on libvirt from Ocata times. They used same workaround.

So I thought: let’s report a bug for libvirt and request support for host-model option. There I got link to an another bug in libvirt with set of information why it does not make sense.

The reason is simple. No one knows what you run on when you run Linux on AArch64 server. In theory there are fields in /proc/cpuinfo but still you do not know do cpu cores in compute01 are same as compute02 servers. At least from nova/libvirt/qemu perspective. This also blocks us from setting cpu_mode to custom and selecting cpu_model which could be some way of getting same cpu for each instance despite of types of compute node processor cores.

The good side is that VM instances will work. The problem may appear when you migrate VM to cpu with other core — it may work. Or may not. Good luck!

Running 32-bit ARM virtual machine on AArch64 hardware

It was a matter of days and finally all pieces are done. Running 32-bit ARM virtual machines on 64-bit AArch64 hardware is possible and quite easy.

Requirements

  • AArch64 hardware (I used APM Mustang as usual)
  • ARM rootfs (fetched Fedora 22 image with “virt-builder” tool)
  • ARM kernel and initramfs (I used Fedora 24 one)
  • Virt Manager (can be done from shell too)

Configuration

Start “virt-manager” and add new machine:

Add new machine

Select rootfs, kernel, initramfs (dtb will be provided internally by qemu) and tell kernel where rootfs is:

Storage/boot options

Then set amount of memory and cores. I did 10GB of RAM and 8 cores. Save machine.

Let’s run

Open created machine and press Play button. It should boot:

Booted system

I upgraded F22 to F24 to have latest development system.

Is it fast?

If I would just boot and write about it then there will be questions about performance. I did build of gcc 5.3.1-3 using mock (standard Fedora way). On arm32 Fedora builder it took 19 hours, on AArch64 builder 4.5h only. On my machine AArch64 build took 9.5 hour and in this vm it took 12.5h (slow hdd used). So builder with memory and some fast storage will boost arm32 builds a lot.

Numbers from “openssl speed” shows performance similar to host cpu:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2               1787.41k     3677.19k     5039.02k     5555.88k     5728.94k
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4              24846.05k    81594.07k   226791.59k   418185.22k   554344.45k
md5              18881.79k    60907.46k   163927.55k   281694.58k   357168.47k
hmac(md5)        21345.25k    69033.83k   177675.52k   291996.33k   357250.39k
sha1             20776.17k    65099.46k   167091.03k   275240.62k   338582.71k
rmd160           15867.02k    42659.95k    88652.54k   123879.77k   140571.99k
rc4             167878.11k   186243.61k   191468.46k   192576.51k   193112.75k
des cbc          35418.48k    37327.19k    37803.69k    37954.56k    37991.77k
des ede3         13415.40k    13605.87k    13641.90k    13654.36k    13628.76k
idea cbc         36377.06k    38284.93k    38665.05k    38864.71k    39032.15k
seed cbc         42533.48k    43863.15k    44276.22k    44376.75k    44397.91k
rc2 cbc          29523.86k    30563.20k    30763.09k    30940.50k    30857.44k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     60512.96k    66274.07k    67889.66k    68273.15k    68302.17k
cast cbc         56795.77k    61845.42k    63236.86k    63251.11k    63445.82k
aes-128 cbc      61479.48k    65319.32k    67327.49k    67773.78k    66590.04k
aes-192 cbc      53337.95k    55916.74k    56583.34k    56957.61k    57024.51k
aes-256 cbc      46888.06k    48538.97k    49300.82k    49725.44k    50402.65k
camellia-128 cbc    59413.00k    62610.45k    63400.53k    63593.13k    63660.03k
camellia-192 cbc    47212.40k    49549.89k    50590.21k    50843.99k    50012.16k
camellia-256 cbc    47581.19k    49388.89k    50519.13k    49991.68k    50978.82k
sha256           27232.09k    64660.84k   119572.57k   151862.27k   164874.92k
sha512            9376.71k    37571.93k    54401.88k    74966.36k    84322.99k
whirlpool         3358.92k     6907.67k    11214.42k    13301.08k    14065.66k
aes-128 ige      60127.48k    65397.14k    67277.65k    67428.35k    67584.00k
aes-192 ige      52340.73k    56249.81k    57313.54k    57559.38k    57191.08k
aes-256 ige      46090.63k    48848.96k    49684.82k    49861.32k    49892.01k
ghash           150893.11k   171448.55k   177457.92k   179003.39k   179595.95k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000322s 0.000026s   3101.3  39214.9
rsa 1024 bits 0.001446s 0.000073s    691.7  13714.6
rsa 2048 bits 0.008511s 0.000251s    117.5   3987.5
rsa 4096 bits 0.058092s 0.000945s     17.2   1058.4
                  sign    verify    sign/s verify/s
dsa  512 bits 0.000272s 0.000297s   3680.6   3363.6
dsa 1024 bits 0.000739s 0.000897s   1353.1   1115.2
dsa 2048 bits 0.002762s 0.002903s    362.1    344.5
                              sign    verify    sign/s verify/s
 256 bit ecdsa (nistp256)   0.0005s   0.0019s   1977.8    538.3
 384 bit ecdsa (nistp384)   0.0015s   0.0057s    663.0    174.6
 521 bit ecdsa (nistp521)   0.0035s   0.0136s    286.8     73.4
                              op      op/s
 256 bit ecdh (nistp256)   0.0016s    616.0
 384 bit ecdh (nistp384)   0.0049s    204.8
 521 bit ecdh (nistp521)   0.0115s     87.2

Running VMs on Fedora/AArch64

There are moments when more than one machine would be handy. But AArch64 computers are not yet available in shop around a corner we have to go for other options. So this time let check how to get virtual machines working.

Requirements

For this I would use Fedora 22 on APM Mustang (other systems will be fine too). What else will be needed:

  • libvirtd running
  • virt-manager 1.1.0-7 (or higher) installed on AArch64 machine
  • UEFI for AArch64 from Gerd’s Hoffmann firmware repository
  • Fedora or Debian installation iso (Ubuntu does not provide such)
  • computer with X11 working (to control virt-manager)

UPDATE: starting from Fedora 23 UEFI package is in distribution repository. Packages are ‘edk2-arm’ for 32bit arm, ‘edk2-aarch64’ for 64bit arm and ‘edk2-ovmf’ for x86-64 architecture.

Is KVM working?

First we need to get KVM working — run “dmesg|grep -i kvm” after system boot. It should look like this:

hrw@pinkiepie-f22:~$ dmesg|grep -i kvm
[    0.261904] kvm [1]: interrupt-controller@780c0000 IRQ5
[    0.262011] kvm [1]: timer IRQ3
[    0.262026] kvm [1]: Hyp mode initialized successfully

But you can also get this:

[    0.343796] kvm [1]: GICV size 0x2000 not a multiple of page size 0x10000
[    0.343802] kvm [1]: error: no compatible GIC info found
[    0.343909] kvm [1]: error initializing Hyp mode: -6

In such case fixed DeviceTree blob from bug #1165290 would be needed. Fetch attached DTB, store as “/boot/mustang.dtb” and then edit “/etc/grub2-efi.cfg” file so kernel entry will look like this:

menuentry 'Fedora (4.0.0-0.rc5.git4.1.fc22.aarch64) 22 (Twenty Two)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.0.0-0.rc5.git2.4.1.fc22.aarch64-advanced-13e42c65-e2eb-4986-abf9-262e287842e4' {
        load_video
        insmod gzio
        insmod part_gpt
        insmod ext2
        set root='hd1,gpt32'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd1,gpt32 --hint-efi=hd1,gpt32 --hint-baremetal=ahci1,gpt32  13e42c65-e2eb-4986-abf9-262e287842e4
        else
          search --no-floppy --fs-uuid --set=root 13e42c65-e2eb-4986-abf9-262e287842e4
        fi
        linux /boot/vmlinuz-4.0.0-0.rc5.git4.1.fc22.aarch64 root=UUID=13e42c65-e2eb-4986-abf9-262e287842e4 ro  LANG=en_GB.UTF-8
        initrd /boot/initramfs-4.0.0-0.rc5.git4.1.fc22.aarch64.img
        devicetree /boot/mustang.dtb
}

After reboot KVM should work.

Software installation

Next step is installing VM software: “dnf install libvirt-daemon* virt-manager” will handle that. But to run Virt Manager we also need a way to see it. X11 forwarding over ssh to the rescue ;D After ssh connection I usually cheat with “sudo ln -sf ~hrw/.Xauthority /root/.Xauthority” to be able to run UI apps as root user.

UEFI firmware

Next phase is UEFI which allows us to boot virtual machine with ISO installation images (compared to kernel/initrd combo when there is no firmware/bootloader possibility). We will install one from repository provided by Gerd Hoffmann:

hrw@pinkiepie-f22:~$ sudo -s
root@pinkiepie-f22:hrw$ cd /etc/yum.repos.d/
root@pinkiepie-f22:yum.repos.d$ wget https://www.kraxel.org/repos/firmware.repo
root@pinkiepie-f22:yum.repos.d$ dnf install edk2.git-aarch64

Then libvirtd config change to give path for just installed firmware. Edit “/etc/libvirt/qemu.conf” file and at the end of file add this:

nvram = [
   "/usr/share/edk2.git/aarch64/QEMU_EFI-pflash.raw:/usr/share/edk2.git/aarch64/vars-template-pflash.raw"
]

Restart libvirtd via “systemctl restart libvirtd“.

Running Virtual Machine Manager

Now we can connect via “ssh -X” and run “sudo virt-manager“:

vm1

Next step is connection to libvirtd:

vm2vm3

Now we are ready for creating VMs. After pressing “Create a new VM” button we should see this:

vm4

And then creation of VM goes nearly like on x86 machines as there is no graphics only serial console.

But if you forgot to setup UEFI firmware then you will get this:

vm5

In such case get back to UEFI firmware step.

Installing Fedora 22 in VM

So let’s test how it works. Fedora 22 is in Beta phase now so why not test it?

vm6

vm10

vm11

2GB ram and 3 cpu cores should be more than enough ;D

vm12

And 10GB for minimal system:

vm13

vm14

But when it went to serial console it did not look good 🙁

vm15

I realized that I forgot to install fonts, but quick “dnf install dejavu*fonts” sorted that out:

vm16

Go for VNC controller installation.

After installation finish system runs just fine:

vm17

Summary

As you can see Fedora 22 has everything in place to get VM running on AArch64. UEFI firmware is the only thing out of distribution but that’s due to some license stuff on vfat implementation or something like that. I was running virtual machines with Debian ‘jessie’ and Fedora 22. Wanted to check Ubuntu but all I found was kernel/initrd combo (which is one of ways to boot in virt-manager) but it did not booted in VM.

I will stay with VirtualBox for a bit longer

VMWare, VirtualBox, Xen, KVM, LXC and there are probably several other ways to run virtual machine under Linux.

Years ago I used first one. Then moved to VirtualBox and still use it. But when it comes to rest of list I do not have too much experience. Xen? One of my servers in past was Xen instance. KVM? Maybe used few times to quick boot some ISO image. LXC? never played with.

From time to time I look at tools I use and check for better replacements. When moved to Fedora I decided to try “virt-manager” to move from VirtualBox to QEMU/KVM. Failed. Tried again…

Today, after “playing” with virt-manager in Fedora/rawhide I decided to stay with VirtualBox for longer. It may have some issues but runs my WinXP and Linux instances without any extra problems.

Sorry to say but Virt-manager feels like a tool for its creators only. Error messages which could be in any random language, insisting on using /var/lib/libvirt/images/ for disk images (or do lot of clicks instead) and depending on so many software packages that it will probably take months before someone will finally fix it in rawhide.

And no, I do not want to go to running QEMU/KVM by hand. It is good for booting image to play with. But I need a way to add/remove USB devices in running system. In an easy way.

And yes, I know “report all bugs” mantra…