YADIBP

Let me introduce new awesome project: YADIBP. It is cool, foss, awesome, the one and only and invented here instead of there. And it does exactly what it has to do and in a way it has to be done. Truly cool and awesome.

Using that tool you can build disk images with several supported Linux distributions. Or maybe even with any BSD distribution. And Haiku or ReactOS. Patches for AmigaOS exist too!

Any architecture. Starting from 128 bit wide RUSC-VI to antique architectures like ia32 or m88k as long as you have either hardware or qemu port (patches for ARM fast models in progress).

Just fetch from git and use. Written in BASIC so it should work everywhere. And if you lack BASIC interpreter then you can run it as Python or Erlang. Our developers are so cool and awesome!

But let’s get back to reality — there are gazillions of projects of tool which does one simple thing: builds a disk image. And gazillion will be still written because some people have that “Not Invented Here” syndrome.

And I am getting tired of it.

Today I was fighting with Nova. No idea who won…

I am working on getting OpenStack running on AArch64 architecture, right? So recently I went from “just” building images to also using them to deploy working “cloud” setup. And that resulted in new sets of patches, updates to patches, discussions…

OpenStack is supposed to make virtualization easier. Create accounts, give access to users and they will make virtual machines and use them without worrying what kind of hardware is below etc. But first you have to get it working. So this week instead of only taking care of Kolla and Kolla-ansible projects I also patched Nova. The component responsible for running virtual machines.

One patch was simple edit of existing one to make it comply with all comments. Took some time anyway as I had to write some proper bug description to make sure that reviewers will know why it is so important for us. And once merged we will have UEFI used as default boot method on AArch64. Without any play with hw_firmware_type=uefi property on images (which is easy to forget). But this was the easy one…

Imagine that you have a rack of random AArch64 hardware and want to run a “cloud”. You may end in a situation where you have a mix of servers for compute nodes (the ones where VM instances run). In Nova/libvirt it is handled by cpu_mode option:

It is also possible to request the host CPU model in two ways:

  • “host-model” – this causes libvirt to identify the named CPU model which most closely matches the host from the above list, and then request additional CPU flags to complete the match. This should give close to maximum functionality/performance, which maintaining good reliability/compatibility if the guest is migrated to another host with slightly different host CPUs. Beware, due to the way libvirt detects host CPU, CPU configuration created using host-model may not work as expected. The guest CPU may confuse guest OS (i.e. even cause a kernel panic) by using a combination of CPU features and other parameters (such as CPUID level) that don’t work.

  • “host-passthrough” – this causes libvirt to tell KVM to passthrough the host CPU with no modifications. The difference to host-model, instead of just matching feature flags, every last detail of the host CPU is matched. This gives absolutely best performance, and can be important to some apps which check low level CPU details, but it comes at a cost wrt migration. The guest can only be migrated to an exactly matching host CPU.

Nova assumes host-model when KVM/QEMU is used as hypervisor. And crashes terribly on AArch64 with:

libvirtError: unsupported configuration: CPU mode ‘host-model’ for aarch64 kvm domain on aarch64 host is not supported by hypervisor

Not nice, right? So I made a simple patch to get host-passthrough to be default on AArch64. But when something is so simple then it’s description is probably not so simple…

Reported bug on nova with some logs attached. Then digged for some information which would explain issue better. Found Ubuntu’s bug on libvirt from Ocata times. They used same workaround.

So I thought: let’s report a bug for libvirt and request support for host-model option. There I got link to an another bug in libvirt with set of information why it does not make sense.

The reason is simple. No one knows what you run on when you run Linux on AArch64 server. In theory there are fields in /proc/cpuinfo but still you do not know do cpu cores in compute01 are same as compute02 servers. At least from nova/libvirt/qemu perspective. This also blocks us from setting cpu_mode to custom and selecting cpu_model which could be some way of getting same cpu for each instance despite of types of compute node processor cores.

The good side is that VM instances will work. The problem may appear when you migrate VM to cpu with other core — it may work. Or may not. Good luck!

I am now core reviewer in Kolla

Months of work, tens of patches, hundreds of changed lines. My whole work in Kolla project got rewarded this week. I am now one of core reviewers 🙂

What does it mean? I think that Gema summarised it best:

For those of you who don’t know, this means Kolla has recognised our contributions to the project as first class and are giving Marcin and ARM64 a vote of confidence, they realise we are there to stay.

I found it helpful in my daily work as now I can suggest my coworkers to send their patches directly instead of proxying it through me ;D

Is my work on Kolla done?

During last few months I was working on getting Kolla running on AArch64 and POWER architectures. It was a long journey with several bumps but finally ended.

When I started in January I had no idea how much work it will be and how it will go. Just followed my typical “give me something to build and I will build it” style. You can find some background information in previous post about my work on Kolla.

A lot of failures were present at beginning. Or rather: there was a small amount of images which built. So my first merged change was to do something with Kolla output ;D

  • build: sort list of built/failed images before printing

Debian support was marked for removal so I first checked how it looked, then enabled all possible images and migrated from ‘jessie’ to ‘stretch’ release. Reason was simple: ‘jessie’ (even with backports) lacked packages required to build some images.

  • debian: import key for download.ceph.com repository
  • debian: install gnupg and dirmngr needed for apt-key
  • debian: enable all images enabled for Ubuntu
  • handle rtslib(-fb) package names and dependencies
  • debian: move to stretch
  • Debian 8 was not released yet

Both YUM and APT package managers got some changes. For first one I took care to make sure that it fails if there were missing packages (which was very often during builds for aarch64/ppc64le). It allowed to catch some typo in ‘ironic-conductor’ image. In other words: I make YUM behave closer to APT (which always complain about missing packages). Then I made change for APT to behave more like YUM by making sure that update of packages lists was done before packages were installed.

  • ironic-conductor: add missing comma for centos/source build
  • make yum fail on missing packages
  • always update APT lists when install packages

Of course many images could not be built at all for aarch64/ppc64le architectures. Mostly due to lack of packages and/or external repositories. For each case I was checking is there some way for fixing it. Sometimes I had to disable image, sometimes update packages to newer version. There were also discussions with maintainers of external repositories on getting their stuff available for non-x86 architectures.

  • kubernetes: disable for architectures other than x86-64
  • gnocchi-base: add some devel packages for non-x86
  • ironic-pxe: handle non-x86 architectures
  • openstack-base: Percona-Server is x86-64 only
  • mariadb: handle lack of external repos on non x86
  • grafana: disable for non-x86
  • helm-repository: update to v2.3.0
  • helm-repository: make it work on non-x86
  • kubetoolbox: mark as x86-64 only
  • magnum-conductor: mark as x86-64 only
  • nova-libvirt: handle ppc64le
  • ceph: take care of ceph-fuse package availability
  • handle mariadb for aarch64/ubuntu/source
  • opendaylight: get it working on CentOS/non-x86
  • kolla-toolbox: use proper mariadb packages on CentOS/non-x86

At some moment I had over ten patches in review and all of them depended on the base one. So with any change I had to refresh whole series and reviewers had to review again… Painful it was. So I decided to split out the most basic stuff to get whole patch set split into separate ones. After “base_arch” variable was merged life became much simpler for reviewers and a bit more complicated for me as from now on each patch was kept in separate git branch.

  • add base_arch variable for future non-x86 work

At Linaro we support CentOS and Debian. Kolla supports CentOS/RHEL/OracleLinux, Debian and Ubuntu. I was not making builds with RHEL nor OracleLinux but had to make sure that Ubuntu ones work too. There was funny moment when I realised that everyone using Kolla/master was building images with Ocata packages instead of Pike ;D

  • Ubuntu: use Pike repository

But all those patches meant “nothing” without first one. Kolla had information about which packages are available for aarch64/ppc64le/x86-64 architectures but still had no idea that aarch64 or ppc64le exist. Finally the 50th revision of patch got merged so it now knows ;D

  • Support non-x86 architectures (aarch64, ppc64le)

I also learnt a lot about Gerrit and code reviews. OpenStack community members were very helpful with their comments and suggestions. We had hours of talk on #openstack-kolla IRC channel. Thanks goes to Alicja, Duong Ha-Quang, Jeffrey, Kurt, Mauricio, Michał, Qin Wang, Steven, Surya Prakash, Eduardo, Paul, Sajauddin and many others. You people rock!

So is my work on Kolla done now? Some of it is. But we need to test resulting images, make a Docker repository with them, update official documentation with information how to deploy on aarch64 boxes (hope that there will be no changes needed). Also need to make sure that OpenStack Kolla CI gets Debian based gates operational and provide them with 3rdparty AArch64 based CI so new changes could be checked.

So you run OpenStack on your phone?

For about a year I have been working on OpenStack on AArch64 architecture. And the question from the title is asked from time to time in this or other forms.

Yes, I do have AArch64 powered phone nowadays. But it has just 4GB of memory and runs Android. So is not a good platform for using OpenStack.

I am aware that for many people anything which came from ARM Ltd means small, embedded, not worthy serious effort etc. For me they are not wrong — they are just ‘not up to date’.

We have servers. Sure, someone can say that we had them years ago and it will be right too. There were Marvell server boards, Calxeda had their “high density” boxes with huge amount of quad core cpus. But now we have ‘boring’ ones which can be used in same way as x86-64 ones.

ARM Ltd published SBSA and SBBR specifications which define what ARM server is nowadays. Short version is “boring box which you put into rack, plug power and network, power it on and install any Enterprise Linux distribution”. No need to deal with weird bootloaders (looking from server perspective), random kernel versions etc. Just unpack, connect and use.

But what you get inside? It depends on product. Can be 1 cpu with 8 cores but can also be 1-2 cpus with 48 cores per cpu. Or even more (I heard about 240 cpu cores products but not idea are they on market now). And processors means memory. What about 1TB (terabyte) of memory per CPU? Cavium ThunderX mainboards allow such setup with 8 memory dimms per cpu.

Then goes network. With 32bit ARM machines the problem was “will it support 1GbE?” and with AArch64 servers that problem can re-appear too as some systems do not support ports with less than 10GbE (some ThunderX boards have 3x40GbE + 4x10GbE ports). RJ-45 connector is usually to connect with BMC (think IPMI).

Storage is Serial-ATA, whatever you plug into PCI Express or something on network. Choose your way. I would not be surprised with M.2 connectors too.

Usually that means that several PCI Express chips are present on board to provide all that. On AArch64 most of controllers are already part of SoC to make things easier and faster.

On top of that we run standard distributions like CentOS, Debian, Fedora or OpenSUSE. Out of box, with distro kernels based on mainline kernels. And then we install OpenStack. From packages, as Docker containers, using devstack or any other way we tend to use.

And when I really have to use OpenStack on my phone then it looks like this:

My work on Kolla

During last month I was working on one of OpenStack projects: Kolla. My job was adding support for non-x86 architectures: aarch64 and ppc64le. Also resurrecting Debian support.

A bit of background

At Linaro we work on getting AArch64 (64-bit ARM, arm64) to be present in many places. We have at least two OpenStack instances running at the moment – on AArch64 hardware only.

First we used Debian/jessie and Openstack ‘liberty’ version. Was working. Not best but we helped many projects by providing virtual machines for porting software.

It was built from packages and later (when ‘mitaka’ was released) we moved to virtualenv per component. Out second “cloud” runs that. With proper Neutron networking, live migration and few other nice things.

But virtualenvs were done as quick solution. We decided to move to Docker containers for next release.

And Kolla was chosen as a tool for it. We do not like to reinvent the wheel again and again…

Non-x86 support in Kolla

The problem was typical: Kolla being x86-64 centric. As most of software nowadays. But thanks to work done by Sajauddin Mohammad I had something to use as a base for adding aarch64 support.

I took his patch, slashed out most of it and concentrated on getting minimal changes needed to get something built on AArch64 . Effect was sent for review and is now at 10th version.

Docker images started to appear. But at beginning I was building Ubuntu ones as Debian support was “basically abandoned, on a way out”. From CentOS guys I got confirmation that official Docker image will be generated (it is done already).

I spent some time on making sure that whole non-x86 support is free from any hardcoding wherever possible. As you can see in my working branch it went quite well. Most of arch related changes are related to “distro does not provide package ZYS for that architecture” or to handling of external repositories.

Debian support

And here we come to Debian support. At Linaro we decided to support two community based distributions: CentOS and Debian. But Debian was on a way out in Kolla…

As this was not related much to non-x86 work I decided to use one of x86-64 machines for that stuff.

First builds were against ‘jessie-backports’ base tag. I had to make a patch to tell APT that if I want backports then I really want them. It was sent for review as rest of patches.

Images were building but not so many as for Ubuntu. So I went through all of them and enabled Debian where it was possible. Resulting patch went for review as usual.

Effect was quite nice (on x86-64):

  • debian-binary: 158
  • debian-source: 201

But ‘jessie’ was missing several packages even with backports enabled. So after discussion with my team I decided to drop it and go for Debian/testing ‘stretch’ one instead. It is already frozen for release so no big changes are allowed. Patch in review of course.

At that moment I abandoned one of previous patches as ‘jessie-backports’ was not something I planned to support.

Turned out that ‘stretch’ images have a bit different set of packages installed than ‘jessie’ had. So ‘gnupg’ and ‘dirmngr’ were missing while we need them for importing GPG keys into APT. Proper patch went to review again.

Did rebuild on x86-64:

  • stretch-binary: 137
  • stretch-source: 195

A bit less than ‘jessie-backports’ had, right? Sure, but it also shows that I have to make a new build to check numbers (laptop already has ~1500 docker images generated by kolla).

Cleaning of old Power patch

Remember the patch which all that started from? I did not forgot it and after building all those images I went back to it.

Some parts are just fugly so I skipped them but others were useful if done properly. That’s how new changes were done and some updates to previous ones.

Then I managed to put remote hands on one of Power machines at Red Hat and started builds:

  • debian-binary: 134
  • debian-source: 184
  • ubuntu-binary: 147
  • ubuntu-source: 190

No CentOS builds as there was no centos/ppc64le image available.

Summary

Non-x86 support looks quite nice. There are some images which can not be built as they rely on external repositories so no aarch64 nor ppc64le packages to use.

Debian ‘stretch’ support is not perfect yet but it is something which I plan to maintain so situation will be going to improve. Note that most of my work will go into ‘source’ type of builds as we want to have same images for both Debian and CentOS systems.

My work on changing CirrOS images

What is CirrOS and why I was working on it? This was quite common question when I mentioned what I am working on during last weeks.

So, CirrOS is small image to run in a cloud. OpenStack developers use it to test their projects.

Technically it is yet another Frankenstein OS. Built using Buildroot 2015.05 uses uclibc or glibc (depending on target architecture). Then Ubuntu 16.04 kernel is applied on top and “grub” (also from Ubuntu) is used to make it bootable.

The problem was that it was not done in UEFI bootable way…

My first changes were: switch images to GPT, create EFI system partition and put some bootloader there. I first used CentOS “grub2-efi” packages (as they provided ready to use EFI binaries) and later switched to Ubuntu ones as upstream maintainer (Scott Moser) prefers to have all external binaries to came from one source.

When he was on vacations (so merge request had to wait) I started digging more and more into scripts.

Fixed getopt use as arguments passed between scripts were read partly via getopt, partially by assigning variables to ${X} (where X is a number).

All scripts were moved to use Bash (as /bin/sh in Ubuntu is usually Dash which is minimalist POSIX shell), whitespace got unified between all scripts and some other stuff happened as well.

At one moment all scripts had 1835 lines and my diff was 2250 lines (+1018/-603) long. Hopefully Scott was back and we got most of that stuff merged.

Recent (2016.07.21) images are available and work fine on all platforms. If someone uses them with OpenStack then please remember about setting “short_id” property to “ubuntu16.04” — otherwise there may be a problem with finding rootfs (no virtio-scsi in disk images).

Summary:

architecture booting before booting after
aarch64 direct kernel UEFI or direct kernel
arm direct kernel UEFI or direct kernel
i386 BIOS or direct kernel BIOS, UEFI or direct kernel
powerpc direct kernel direct kernel
ppc64 direct kernel direct kernel
ppc64le direct kernel direct kernel
x86-64 BIOS or direct kernel BIOS, UEFI or direct kernel

My workflow for building big sets of RPM packages

In last months I did two rebuilds: NodeJS 4.x in Fedora and OpenStack Mitaka in CentOS. Both were targeting AArch64 and both were not done before. During latter one I was asked to write about my workflow, so will describe it with OpenStack one as base.

identify what needs to be done

At first I had to figure out what exactly needs to be built. Haïkel Guémar (aka number80) pointed me to openstack-mitaka directory on CentOS’ vault where all source packages are present. Also told me that EPEL repository is not required which helped a lot as it is not yet built for CentOS.

structure of sources

OpenStack set of packages in CentOS is split into two parts: “common” shared through all OpenStack versions and “openstack-mitaka” containing OpenStack Mitaka packages and build dependencies not covered by CentOS itself or “common” directory.

prepare build space

I used “mockchain” for such rebuilds. It is simple tool which does not do any ordering tricks just builds set of packages in given order and do it three times hoping that all build dependencies will be solved that way. Of course what got built once is not tried again.

To make things easier I used shell alias:

alias runmockchain mockchain -r default -l /home/hrw/rpmbuild/_rebuilds/openstack/mockchain-repo-centos

With this I did not have to remember about those two switches. Common call was “runmockchain –recurse srpms/*” which means “cycle packages three times and continue on failures”.

Results of all builds (packages and logs) were kept in “~/_rebuilds/openstack/mockchain-repo-centos/results/default/” subdirectories. I put all extra packages there to have all in one repository.

populate “noarch” packages

Then I copied x86-64 build of OpenStack Mitaka into “_IMPORT-openstack-mitaka/” to get all “noarch” packages for satisfying build dependencies. I built all those packages anyway but having them saved me several rebuilds.

extra rpm macros

When I started first build it turned out that some Python packages lack proper “Provides” fields. I was missing newer rpm build macros (“%python_provide” was added after Fedora 19 which was base for RHEL7). Asked Haïkel and added “rdo-rpm-macros” to mock configuration.

But had to scrap everything I built so far.

surprises and failures

Building big set of packages for new architecture most of time generate failures which were not present with x86-64 build. Same was this time as several build dependencies were missing or wrong.

packages missing in CentOS/aarch64

Some were from CentOS itself — I told Jim Perrin (aka Evolution) and he added them to build queue to fill gaps. I built them in meantime or (if they were “noarch”) imported into “_IMPORT-extras” or “_IMPORT-os” directories.

packages imported from other CBS tags

Other packages were originally imported from other tags at CBS (CentOS koji). For those I created directory named “_IMPORT-cbs”. And again — if they were “noarch” I just copied them. For rest I did full build (using “runmockchain”) and they end in same repository as rest of build.

For some packages it turned out that they got built long time ago with older versions of build dependencies and are not buildable from current versions. For them I tracked proper versions on CBS and imported/built (sometimes with their build dependencies and build dependencies of build dependencies).

downgrading packages

There was a package “python-flake8” which failed to build spitting out Python errors. I checked how this version was built on CBS and turned out that “python-mock” 1.3.0 (from “openstack-mitaka” repository) was too new… Downgraded it to 1.0 allowed me to build “python-flake8” one (upgrading of it is in queue).

merging fixes from Fedora

Both “galera” and “mariadb-galera” got AArch64 support merged from Fedora and got built with “.hrw1” added into “Release” field.

directories

I did whole build in “~/rpmbuild/_rebuilds/openstack/” directory. Extra folders were:

  • vault.centos.org/centos/7/cloud/Source/openstack-mitaka/common/
  • vault.centos.org/centos/7/cloud/Source/openstack-mitaka/
  • mockchain-repo-centos/results/default/_HACKED-by-hew/
  • mockchain-repo-centos/results/default/_IMPORT-cbs/
  • mockchain-repo-centos/results/default/_IMPORT-extras/
  • mockchain-repo-centos/results/default/_IMPORT-openstack-mitaka/
  • mockchain-repo-centos/results/default/_IMPORT-os/

Vault ones were copy of OpenStack source packages and their build dependencies. Hacked ones got AArch64 support merged from Fedora. Both “extras” and “os” directories were for packages missing in CentOS/AArch64 repositories. CBS one was for source/noarch packages which had to be imported/rebuilt because they came from other CBS tags.

status page

In meantime I prepared web page with build results so anyone interested can see what builds, what not and check logs, packages etc. It has simple description and then table with list of builds (data can be sorted by clicking on column headers).

thanks

Whole job would take much more time if not help from CentOS developers: Haïkel Guémar, Alan Pevec, Jim Perrin, Karanbir Singh and others from #centos-devel and #centos-arm IRC channels.