1. OpenDev CI speed-up for AArch64

    I work with OpenDev CI for a while. My first Kolla patches were over three years ago. We (Linaro) added AArch64 nodes few times — some nodes were taken down, some replaced, some added.

    Speed or lack of it

    Whenever you want to install some Python package using pip it is downloaded from Pypi (directly or mirror). If there is a binary package then you get it, if not then “noarch” package is fetched.

    In worst case source tarball is downloaded and whole build process starts. You need to have all required compilers installed, development headers for Python and all required libraries and rest of needed tools. And then wait. And wait as some packages require a lot of time.

    And then repeat it again and again as you are not allowed to upload packages into Pypi for projects you do not own.

    Argh you, protobuf

    There was a new release of protobuf package. OpenStack bot picked it up, sent patch for review and it got merged.

    And all AArch64 CI jobs failed…

    Turned out that protobuf 3.12.0 was released with x86 wheels only. No source tarball. At all.

    This turned out to be new maintainer mistake — after 2-3 weeks it was fixed in 3.12.2 release.

    Another CI job then

    So I started looking at ‘requirements’ project and created a new CI job for it. To check are new package versions are available for AArch64. Took some time and several side updates as well (yak shaving all the way again).

    Stuff got merged and works now.

    Wheels cache

    While working on above CI job I had a discussion with OpenDev infra team how to make it work properly. Turned out that there were old jobs doing exactly what I wanted: building wheels and caching them for next CI tasks.

    It took several talks and patches from Ian Wienand, Clark Boylan, Jeremy ‘fungi’ Stanley and others. Several CI jobs got renamed, some were moved from one project to another. Servers got configuration changes etc.

    Now we have wheels built for both x86-64 and AArch64 architectures. Covering CentOS 7/8, Debian ‘buster’ and Ubuntu ‘xenial/bionic/focal’ releases. For OpenStack ‘master’ and few stable branches.

    Effect

    Requirements project has quick ‘check-uc’ job running on AArch64 to make sure that all packages are available for both architectures. All OpenStack projects profit from it.

    In Kolla ‘openstack-base’ image went from 23:49 to just 5:21 minutes. Whole Debian/source build is now 57 minutes instead of 2 hours 20 minutes.

    Nice result, isn’t it?

    Written by Marcin Juszkiewicz on
  2. RockPro64 some time later

    Some time passed since I got that board. And some things changed as well.

    U-Boot changes

    I went through U-Boot and submitted a bunch of changes. Most of them were small tweaks to RockPro64 board configuration:

    • enable USB OHCI host (so USB 1.1 things can be plugged directly into ports)
    • enable USB keyboard
    • enable RNG support
    • enable SPI boot
    • start USB support

    With above changes I have all firmware parts stored in on-board SPI chip and do not have to worry about storing whatever on removable media. Also do not need serial console to play with U-Boot as everything can be done with USB keyboard and HDMI monitor.

    Board configuration

    To be able to work comfortably with RockPro64 board I mounted it in acrylic case from Pine A64 and then added some tweaks.

    RockPro64 setup on my desk
    RockPro64 setup on my desk

    Breadboard wires and switches

    Black and white cables are SPI clock and ground. Using button I can short them during boot so on-board flash gets ignored and U-Boot gets loaded from SD card.

    Two orange wires are serial Rx line going through hardware switch. The reason is simple: for some reason board does not boot when Rx line is used during first power on (right after power plug insert, not after pressing Power button). So instead of disconnecting wire each time I can just move switch left, power on, wait for U-Boot messages, switch right and have both directions of serial console working.

    Too bad that there no Power/Reset pins as I would move buttons to breadboard as well to make them more reachable.

    USB hubs

    Next to the board I have a bunch of USB hubs:

    • USB-C to USB-A/HDMI/Power one from one of previous Linaro Connects
    • USB 3.0 one connected with microUSB cable so work as USB 2.0 one
    • USB 1.1 one

    Each of them was in use when I played with enabling USB keyboard as I do not have USB 2.0 keyboard and needed a way to connect standard 1.1 one.

    Other stuff

    Tincantools SPI Hook works as serial console and there is PCI Express x8->x8 (open ended) riser so I can play with different PCIe cards.

    And there is a jumper on ‘disable eMMC’ pins as I lack such module.

    What next?

    Next step would be testing distribution installers a bit. And check how they work. I did some attempts already and waiting for fixes to be merged.

    Written by Marcin Juszkiewicz on
  3. From a diary of AArch64 porter — firefighting

    When I was a kid there was a children’s book about Wojtek who wanted to be firefighter. It is part of culture for my generation.

    I never wanted to follow Wojtek’s dreams. But during last years I became firefighter. And this is not a good thing in a long term.

    CI failures

    During last months we (Linaro) took care of AArch64 support in OpenStack infrastructure. There are nodes with CentOS 7 and 8, Debian ‘stretch’ and ‘buster, Ubuntu ‘xenial’, ‘bionic’ and ‘focal’. And several CI jobs in some projects (Disk Image Builder, Kolla, Nova and some other).

    And those CI jobs tend to fail. As usual, right? Not quite…

    Missing Python packages

    One day when I joined irc in the morning I was greeted with “aarch64 ci fails — can you take a look?” from one of project developers.

    Quick look into logs:

    ERROR: Could not find a version that satisfies the requirement ntplib (from versions: none)
    ERROR: No matching distribution found for ntplib
    

    Then usual path. Pypi, check release, history, go to homepage, fill issue. Curse. And wait for upstream to fix problem. They fixed, CI was working again.

    Last Monday — I started work ready to do something interesting. And then was greeted with same story “aarch64 ci fails, can you take a look?”.

    ERROR: Could not find a version that satisfies the requirement protobuf==3.12.0 
    (from versions: 2.0.0b0, 2.0.3, 2.3.0, 2.4.1, 2.5.0, 2.6.0, 2.6.1, 3.0.0a2,
    3.0.0a3, 3.0.0b1, 3.0.0b1.post1, 3.0.0b1.post2, 3.0.0b2, 3.0.0b2.post1,
    3.0.0b2.post2, 3.0.0b3, 3.0.0b4, 3.0.0, 3.1.0, 3.1.0.post1, 3.2.0rc1,
    3.2.0rc1.post1, 3.2.0rc2, 3.2.0, 3.3.0, 3.4.0, 3.5.0.post1, 3.5.1, 3.5.2,
    3.5.2.post1, 3.6.0, 3.6.1, 3.7.0rc2, 3.7.0rc3, 3.7.0, 3.7.1, 3.8.0rc1, 3.8.0,
    3.9.0rc1, 3.9.0, 3.9.1, 3.9.2, 3.10.0rc1, 3.10.0, 3.11.0rc1, 3.11.0rc2, 3.11.0,
    3.11.1, 3.11.2, 3.11.3)
    

    Pypi, check, homepage, there was an issue already filled. So far no upstream response.

    Problem got solved by moving all OpenStack projects to previous (3.11.3) release.

    Missing/deprecated versions in distributions

    So I started work on adding another CI job. This time for ‘requirements’ OpenStack project. To make sure that whatever Python package upgrade will be available on AArch64 as well.

    As usual had to add a pile of distro dependencies to get ‘numpy’ and ‘scipy’ built correctly. And bump timeout to 2 hours. Build was going nice.

    And then ‘confluent_kafka’ hit hard:

    /tmp/pip-install-ld4fzu94/confluent-kafka/confluent_kafka/src/confluent_kafka.h:65:2: 
    error: #error "confluent-kafka-python requires librdkafka v1.4.0 or later.
    Install the latest version of librdkafka from the Confluent repositories, 
    see http://docs.confluent.io/current/installation.html"                                                            
    
    

    librdkafka v1.4.0 or later” is not available in any distribution used on OpenStack infra nodes. Fantastic!

    And repositories mentioned in error message are x86-64 only. Fun!

    Sure, can do usual pypi, release, homepage, issue. Even went that way. Chain of GitHub projects building components into repositories. x86-64 only all the way.

    External CI services

    Which gets us to another thing — external CI services. You know: Azure pipelines, GitHub actions or Travis CI (alphabetical order). Used in misc ways by most FOSS projects nowadays.

    Each of them has some kind of AArch64 support nowadays. But it looks like only Travis provides you with hardware. Azure and GitHub only can connect your external machines to their CI service.

    Speed or lack of it

    So you have a project where you need AArch64 support and upstream is already using Travis for their CI needs. Lucky you!

    You work with project developers you get test suite running and then it time outs. It does not matter that CI machine you got have few cores because ‘pytest’ does not know how to run tests in parallel.

    So you cut tests completely or partially. Or just abandon the idea.

    No hardware, no binaries

    If you are less lucky then you may get such answer from upstream. I had such in past few times. And I fully understand them — why support something when you can not even test does it work?

    Firefighting or yak shaving?

    When I discussed it with friends one of them mentioned that this reminds him more yak shaving than firefighting. To be honest it is both most of time.

    Written by Marcin Juszkiewicz on
  4. New toy arrived — RockPro64

    AArch64 world has toys on cheap side, servers on expensive and some attempts in a middle.

    Me.

    I started working on AArch64 in 2012 year. Before hardware was available. Even before toolchain bits were available. Then first prototype server arrived at Red Hat, then it got replaced with Applied Micro Mustang. Then more servers, Mustang under desk. Joined Linaro (as Red Hat assignee) and started using their server lab.

    And during all those years I never played with AArch64 SBC.

    If it lacks proper storage and 16+GB ram then I am not interested.

    Some months ago I decided to change it.

    What to choose?

    There are many AArch64 boards to choose from. Some are better, some are not. Popular ones and those with features. Like usual.

    So I asked Peter Robinson and few other folks about their proposal and most of them agreed that Rockchip RK3399 looks most interesting. Has PCI Express slot on several boards, uses Mali T860 so can play with Panfrost driver, USB 3 ports are present etc.

    RockPro64

    After some market research I decided on RockPro64 from Pine64. Board has USB 3 in both A and C type, PCIe x4 slot is present and I can reuse acrylic case from Pine A64 board I bought few years ago. And it has “the most expensive chip in ARM world” present (16MB SPI flash for boot firmware).

    Ordered in March, board arrived two days ago.

    Setup

    I digged in my boxes and found 12V/5A power supply (3A is recommended minimum), used some Ethernet cable flying under desk, old 64GB usb3 thumb drive for storage and my lovely IOGEAR GKM561R keyboard/trackpoint which I use for all boards bring-up.

    Mounted board into acrylic case and started wiring extra stuff.

    Disable SPI button

    As boot firmware can be stored in onboard SPI chip there has to be a way to ignore it on boot. According to disable SPI note on vendor website it is a matter of crossing SPI_CLK with GND during boot.

    So small breadboard, two wires, button and I am ready.

    Serial console

    As usual serial console is on pins. So breadboard got one of my cheap USB-Serial dongles plugged and then wires started.

    Serial console setup post on forum says that 6/8/10 pins are all I need. Did them, connected USB cable and started picocom at 1.5Mbps speed:

    Terminal ready
    ��tU�����UU���˕�U���TU J���UU���Q�UR�)���Q�
    *�R��uT��*��U�
          �VQ��Y��ս+UJ.���I��zT����UQ])
    

    Lovely! Turned out that CP2102 dongles are only “expected” to work at that speed. Digged deeper in drawers and found SPI Hook device from TinCanTools. It uses FT2232HL so should be fine with up to 12Mbps speed.

    And worked without problems:

    U-Boot TPL 2020.04 (Apr 20 2020 - 00:00:00)
    Channel 0: LPDDR4, 50MHz
    BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB
    Channel 1: LPDDR4, 50MHz
    BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB
    256B stride
    256B stride
    lpddr4_set_rate: change freq to 400000000 mhz 0, 1
    lpddr4_set_rate: change freq to 800000000 mhz 1, 0
    Trying to boot from BOOTROM
    Returning to boot ROM...
    
    U-Boot SPL 2020.04 (Apr 20 2020 - 00:00:00 +0000)
    Trying to boot from MMC1
    
    U-Boot 2020.04 (Apr 20 2020 - 00:00:00 +0000)
    
    SoC: Rockchip rk3399
    Reset cause: POR
    Model: Pine64 RockPro64
    DRAM:  3.9 GiB
    PMIC:  RK808 
    MMC:   dwmmc@fe320000: 1, sdhci@fe330000: 0
    Loading Environment from MMC... Card did not respond to voltage select!
    *** Warning - No block device, using default environment
    

    So I was ready to play!

    Mali

    So when Arm will open source Mali drivers?

    This question is present on each “Ask Arm Anything” sessions during Linaro Connect (ARM partners only session, badges are checked on room entry). And answer always is more or less “never”.

    But there is Panfrost now. Fully open source driver for Mali family used in RK3399 SoC.

    Booted board and it got stuck:

    [   29.043106] panfrost ff9a0000.gpu: clock rate = 500000000
    [   29.053073] rockchip-vop ff8f0000.vop: Adding to iommu group 1
    [   29.058099] panfrost ff9a0000.gpu: mali-t860 id 0x860 major 0x2 minor 0x0 status 0x0
    [   29.058948] panfrost ff9a0000.gpu: features: 00000000,100e77bf, issues: 00000000,24040400
    [   29.059820] panfrost ff9a0000.gpu: Features: L2:0x07120206 Shader:0x00000000 Tiler:0x00000809 Mem:0x1 MMU:0x00002830 AS:0xff JS:0x7
    [   29.061053] panfrost ff9a0000.gpu: shader_present=0xf l2_present=0x1
    [   29.063559] rockchip-vop ff900000.vop: Adding to iommu group 2
    [   29.147966] panfrost ff9a0000.gpu: devfreq_add_device: Unable to find governor for the device
    [   29.151244] panfrost ff9a0000.gpu: [drm:panfrost_devfreq_init [panfrost]] *ERROR* Couldn't initialize GPU devfreq
    [   29.173499] panfrost ff9a0000.gpu: Fatal error during devfreq init
    

    Discussed issue with people on #panfrost IRC channel and they pointed me at “governor_simpleondemand” kernel module being missing. So I rebuilt initramfs with “dracut -v —force-drivers governor_simpleondemand —force” command and it booted to FullHD framebuffer.

    Some work is needed here as monitor has 3440x1440 resolution so at least 2560x1440 or 2560x1080 should be used. But that can wait for now. I plan to move this board somewhere else on desk and connect to 24” full hd monitor.

    Plans?

    I do not plan to use that board for any serious stuff. More as a device to learn how EBBR world looks. Boot sequence with U-Boot loading Grub from EFI system partition looks good.

    There are some things which need work. I want to have boot firmware stored in SPI flash as now I have 1GB microsd card in slot just to have U-Boot loaded.

    There are some changes in U-Boot waiting for someone to test, USB-C port is not enabled at all (it should work as USB 3 and DisplayPort).

    So who knows :D

    Written by Marcin Juszkiewicz on
  5. Another blog updates

    Recently one of my friends worked on his websites and we exchanged some hints on how to make things better for accessibility and SEO. I checked my website and decided to work on another set of improvements.

    Headers

    First thing was headers (h1-h6) and their hierarchy. Loaded 51 articles, fixed their markup and then worked a bit on CSS to get styling like I wanted it to look.

    Now there is just one H1 on website (page title), title of each article is in H2 and then H3-H5 are used in article content. Related posts are H3 as well as they are part of article.

    Image captions

    This thing was on my list for a while. In Wordpress times some articles used images with captions, some were not. Loaded 99 articles, fixed image titles and added ‘yafg’ plugin to Markdown to get all images described. Then some CSS styling and effect is quite ok:

    Fog somewhere in Spanish mountains
    Fog somewhere in Spanish mountains

    Small tweaks

    PageSpeed Insights complained that there can be a moment with no visible text (a flash of invisible text (FOIT) in webdev speak). Three small changes to get ‘font-display:swap’ in CSS and problem solved.

    Also added some extra tags into head part of page — mostly OpenGraph junk.

    Hope that it will be better usable now.

    Written by Marcin Juszkiewicz on
  6. April Fools’ Day

    Each year has a day when people make jokes and watch the world burn.

    2008: Let’s move to London

    Twelve years ago, when I worked with London based OpenedHand company, I wrote about moving to London. So many people sent congratulations, offered help in getting settled etc. Next day I wrote that no, I am not moving to London.

    2020: Let’s move to Vancouver

    Two days ago I decided to repeat it some way. But it was late in the evening so I just wrote that on Facebook (at 00:14 local time):

    12 years ago there was an idea to move to London. But it did not worked out. Looking at situation in Poland I am considering recent Amazon offer. Sure, Vancouver is far away but it is also nice city. And have snow during winter.

    I can fly with one change for xmas or holidays.

    There were many questions am I sure I want to do it. Even my wife asked ;D

    One of friends thought that I was either serious or fishing for a raise (as for him message was written on 31st March due timezones).

    AArch64 For Developers

    When I wrote above post I got sick idea. Why not try to fool wider range? On easy target?

    Quick check how ‘Prima Aprilis’ is called in English and hunt for good (and cheap) domain started. Turned out that Polish ones will be cheapest.

    In the morning I bought ‘afd.biz.pl’ domain for about 4 EUR and started preparations. Free Bootstrap template, some pictures from the Internet and started working on page of not existing Polish startup specializing in computer boards design.

    NUC

    Many people in AArch64 industry would like to be able to buy NUC like computer. So it went first. Picture of some random Chinese product, some text and done.

    Project date set to 1st April 2019, status ‘cancelled’ so simple reason: “lack of cheap PCI Express graphics chips which would fit on mainboard”. In my opinion it is valid reason as for AArch64 NUC to success either AMD Radeon or some NVidia GPU would need to be used. And I doubt that they can be bought in small quantities.

    ATX mainboard

    But AFD company had also more interesting project — full size ATX motherboard:

    ATX motherboard

    Design is an edited page of MSI X570-A PRO mainboard manual. Removed all RGB lights connectors, pump fan, additional 12V connector and few more elements. Copied some piece of I/O shield to not have a hole there.

    Then I presented page to Leif for some review and he provided some hints on how to make it more believable.

    Technical specification was too nice to be true (NUC mentioned above got subset of them):

    • SBSA level 5, SBBR, ServerReady etc compliant
    • Arm v8.4 cpu with 16 cores (Cortex-A72 like performance)
    • CPU will be user upgradable
    • cpu cooling is AMD AM4 compatible
    • Up to 64GB of DDR4 memory
    • ECC support not tested yet
    • 32GB sticks not tested but should work and give 128GB ram
    • Two m.2 slots (PCIe x4) for NVME storage
    • PCIe setup: x16, x8, 3 x1 slots
    • six SATA III ports
    • on board 1GbE network
    • 5.1 audio support
    • two USB 2.0 on i/o shield + 4 on headers
    • six USB 3.1 (4 type A, 2 type C) on i/o shield + 4 on headers
    • serial console on I/O board

    Final price was set to be around 800 EUR with an option to buy motherboard separately from CPU as there are companies where you can buy stuff quite easy if you fit in 500 USD.

    I also added something for Openmoko fans — first 50 boards go to FOSS developers (free of cost).

    Hints and guesses

    Some hints that it is fake were present. Project date was set to 1st April 2020, CPU listed as ARMv8.4 while there are no such ones on a market.

    I got several guesses from people. CPU being 16-core Ampere Skylark, mainboard being done by MSI using AMD southbridge for whole I/O:

    Using AMD’s southbridges makes a lot of sense anyway, those are really just PCIe-multi-I/O chips anyway. That way, the CPU just needs to be connected to RAM, PCIe and a BIOS flash, while the southbridge can provide USB, SATA, audio, i2c, etc along with additional PCIe lanes.

    AFD names

    Of course AFD mean April Fools’ Day. For a moment I wanted to use AArch64 For Developers name on website but it did not fit to scheme. I also got Alternativ Für Datencenter as possible name.

    Conclusion

    It was fun. I may repeat is one day but with a bit more preparation.

    Written by Marcin Juszkiewicz on
  7. Kapturek, Gossamer, Blossom!

    Few days ago I got company laptop refreshed (from Lenovo Thinkpad t460s to t490s). So it needed a new name…

    I name my personal devices by characters from Winnie the Pooh books. More info about it in blog post from 2012.

    But this is company laptop…

    Rule above applies only to my own machines (or ones I maintain). For company hardware I use names from other stories. And those characters have to be red.

    In 2013 I got first laptop. It got name ‘kapturek’ after “Czerwony Kapturek” (Little Red Riding Hood).

    As we replace laptops every three years 2016/7 brought another one. I could just move operating system from one machine to another but it is new machine so new name.

    It became ‘gossamer’ after monster from Looney Tunes:

    Gossamer from Looney Toons
    Gossamer from Looney Toons (on the right)

    Today I am installing operating system on 3rd laptop. Again, which name to choose…

    Quick web searching gave me some characters to choose from. So this laptop will be called ‘blossom’ after Blossom from The Powerpuff Girls series:

    Blossom from The Powerpuff Girls
    Blossom from The Powerpuff Girls

    Now I have some time before choosing new name ;D

    Written by Marcin Juszkiewicz on
  8. Sharing PCIe cards across architectures

    Some days ago during one of conference calls one of my co-workers asked:

    Has anyone ever tried PCI forwarding to an ARM VM on an x86 box?

    As my machine was opened I just turned it off and inserted SATA controller into one of unused PCI Express slots. After boot I started one of my AArch64 CirrOS VM instances and gave it this card. Worked perfectly:

    [   21.603194] pcieport 0000:00:01.0: pciehp: Slot(0): Attention button pressed
    [   21.603849] pcieport 0000:00:01.0: pciehp: Slot(0) Powering on due to button press
    [   21.604124] pcieport 0000:00:01.0: pciehp: Slot(0): Card present
    [   21.604156] pcieport 0000:00:01.0: pciehp: Slot(0): Link Up
    [   21.739977] pci 0000:01:00.0: [1b21:0612] type 00 class 0x010601
    [   21.740159] pci 0000:01:00.0: reg 0x10: [io  0x0000-0x0007]
    [   21.740199] pci 0000:01:00.0: reg 0x14: [io  0x0000-0x0003]
    [   21.740235] pci 0000:01:00.0: reg 0x18: [io  0x0000-0x0007]
    [   21.740271] pci 0000:01:00.0: reg 0x1c: [io  0x0000-0x0003]
    [   21.740306] pci 0000:01:00.0: reg 0x20: [io  0x0000-0x001f]
    [   21.740416] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x000001ff]
    [   21.742660] pci 0000:01:00.0: BAR 5: assigned [mem 0x10000000-0x100001ff]
    [   21.742709] pci 0000:01:00.0: BAR 4: assigned [io  0x1000-0x101f]
    [   21.742770] pci 0000:01:00.0: BAR 0: assigned [io  0x1020-0x1027]
    [   21.742803] pci 0000:01:00.0: BAR 2: assigned [io  0x1028-0x102f]
    [   21.742834] pci 0000:01:00.0: BAR 1: assigned [io  0x1030-0x1033]
    [   21.742866] pci 0000:01:00.0: BAR 3: assigned [io  0x1034-0x1037]
    [   21.742935] pcieport 0000:00:01.0: PCI bridge to [bus 01]
    [   21.742961] pcieport 0000:00:01.0:   bridge window [io  0x1000-0x1fff]
    [   21.744805] pcieport 0000:00:01.0:   bridge window [mem 0x10000000-0x101fffff]
    [   21.745749] pcieport 0000:00:01.0:   bridge window [mem 0x8000000000-0x80001fffff 64bit pref]
    

    Let’s go deeper

    Next day I turned off desktop for CPU cooler upgrade. During process I went through my box of expansion cards and plugged additional USB 3.0 controller (Renesas based). Also added SATA hard drive and connected it to previously added controller.

    Once computer was back online I created new VM instance. This time I used Fedora 32 beta. But when I tried to add PCI Express card I got an error:

    Error while starting domain: internal error: process exited while connecting to monitor: 2020-03-25T13:43:39.107524Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: VFIO_MAP_DMA: -22
    2020-03-25T13:43:39.107560Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: vfio 0000:29:00.0: failed to setup container for group 28: memory listener initialization failed: Region mach-virt.ram: vfio_dma_map(0x563169753c80, 0x40000000, 0x100000000, 0x7fb2a3e00000) = -22 (Invalid argument)
    
    Traceback (most recent call last):
      File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
        callback(asyncjob, *args, **kwargs)
      File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
        callback(*args, **kwargs)
      File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn
        ret = fn(self, *args, **kwargs)
      File "/usr/share/virt-manager/virtManager/object/domain.py", line 1279, in startup
        self._backend.create()
      File "/usr/lib64/python3.8/site-packages/libvirt.py", line 1234, in create
        if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
    libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-03-25T13:43:39.107524Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: VFIO_MAP_DMA: -22
    2020-03-25T13:43:39.107560Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: vfio 0000:29:00.0: failed to setup container for group 28: memory listener initialization failed: Region mach-virt.ram: vfio_dma_map(0x563169753c80, 0x40000000, 0x100000000, 0x7fb2a3e00000) = -22 (Invalid argument)
    

    Hmm. It worked before. Tried other card — with the same effect.

    Debugging

    Went to #qemu IRC channel and started discussing issue with QEMU developers. Turned out that probably no one tried sharing expansion cards to foreign architecture guest (in TCG mode instead of same architecture KVM mode).

    As I had VM instance where sharing card worked I started checking what was wrong. After some restarts it was clear that crossing 3054 MB of guest memory was enough to get VFIO errors like above.

    Reporting

    Issue not reported does not exist. So I opened a bug against QEMU. Filled it with error messages, “lspci” output data for used cards, QEMU command line (generated by libvirt) etc.

    Looks like the problem lies in architecture differences between x86-64 (host) and aarch64 (guest). Let me quote Alex Williamson:

    The issue is that the device needs to be able to DMA into guest RAM, and to do that transparently (ie. the guest doesn’t know it’s being virtualized), we need to map GPAs into the host IOMMU such that the guest interacts with the device in terms of GPAs, the host IOMMU translates that to HPAs. Thus the IOMMU needs to support GPA range of the guest as IOVA. However, there are ranges of IOVA space that the host IOMMU cannot map, for example the MSI range here is handled by the interrupt remmapper, not the DMA translation portion of the IOMMU (on physical ARM systems these are one-in-the-same, on x86 they are different components, using different mapping interfaces of the IOMMU). Therefore if the guest programmed the device to perform a DMA to 0xfee00000, the host IOMMU would see that as an MSI, not a DMA. When we do an x86 VM on and x86 host, both the host and the guest have complimentary reserved regions, which avoids this issue.

    Also, to expand on what I mentioned on IRC, every x86 host is going to have some reserved range below 4G for this purpose, but if the aarch64 VM has no requirements for memory below 4G, the starting GPA for the VM could be at or above 4G and avoid this issue.

    I have to admit that this is too low-level for me. I hope that the problem I hit will help someone to improve QEMU.

    Written by Marcin Juszkiewicz on
  9. CirrOS 0.5.0 released

    Someone may say that I am main reason why CirrOS project does releases.

    In 2016 I got task at Linaro to get it running on AArch64. More details are in my blog post ‘my work on changing CirrOS images’. Result was 0.4.0 release.

    Last year I got another task at Linaro. So we released 0.5.0 version today.

    But that’s not how it happened.

    Multiple contributors

    Since 0.4.0 release there were changes done by several developers.

    Robin H. Johnson took care of kernel modules. Added new ones, updated names. Also added several new features.

    Murilo Opsfelder Araujo fixed build on Ubuntu 16.04.3 as gcc changed preprocessor output.

    Jens Harbott took care of lack of space for data read from config-drive.

    Paul Martin upgraded CirrOS build system to BuildRoot 2019.02.1 and bumped kernel/grub versions.

    Maciej Józefczyk took care of metadata requests.

    Marcin Sobczyk fixed starting of Dropbear and dropped creation of DSS ssh key which was no longer supported.

    My Linaro work

    At Linaro I got Jira card with “Upgrade CirrOS’ kernel to Ubuntu 18.04’s kernel” title.

    This was needed as 4.4 kernel was far too old and gave us several booting issues. Internally we had builds with 4.15 kernel but it should be done properly and upstream.

    So I fetched code, did some test builds and started looking how to improve situation. Spoke with Scott Moser (owner of CirrOS project) and he told me about his plans to migrate from Launchpad to GitHub. So we did that in December 2019 and then fun started.

    Continuous Integration

    GitHub has several ways of adding CI to projects. First we tried GitHub Actions but turned out that it is paid service. Looked around and then I decided to go with Travis CI.

    Scott generated all required keys and integration started. Soon we had every pull request going through CI. Then I added simple script (bin/test-boot) so each image was booted after build. Scott improved script and fixed Power boot issue.

    Next step was caching downloads and ccache files. This was huge improvement!

    In meantime Travis bumped free service to 5 simultaneous builders which got our builds even faster.

    CirrOS supports building only under Ubuntu LTS. But I use Fedora so we merged two changes to make sure that proper ‘grub(2)-mkimage’ command is used.

    Kernel changes

    4.4 kernel had to go. First idea was to move to 4.18 from Ubuntu 18.04 release. But if we upgrade then why not going for HWE one? I checked 5.0 and 5.3 versions. As both worked fine we decided to go with newer one.

    Modules changes

    During start of CirrOS image several kernel modules are loaded. But there were several “no kernel module found” like messages for built-in ones.

    We took care of it by querying /sys/module/ directory so now module loading is quiet process. At the end a list of loaded ones is printed.

    VirtIO changes

    Lot of things happened since 4.4 kernels. So we added several VirtIO modules.

    One of results is working graphical console on AArch64. Thanks to ‘virtio-gpu’ providing framebuffer and ‘hid-generic’ handling usb input devices.

    As lack of entropy is common issue in VM instances we added ‘virtio-rng’ module. No more ‘uninitialized urandom read’ messages from kernel.

    Final words

    Yesterday Scott created 0.5.0 tag and CI built all release images. Then I wrote release notes (based on ones from pre-releases). Kolla project got patch to move to use new version.

    When next release? Looking at history someone may say 2023 as previous one was in 2016 year. But know knows. Maybe we will get someone with “please add s390x support” question ;D

    Written by Marcin Juszkiewicz on
  10. My whole career is built on FOSS

    Some time ago at one of Red Hat mailing lists someone asked “how has open source helped your career”. There were several interesting stories. I had mine as well.

    2000

    My first contribution to FOSS. It was updating Debian ‘potato’ installation guide for Amiga/m68k. I was writing article to new Amiga magazine called ‘eXec’ about installing Debian. So why not update official instruction at same time?

    2002

    Probably my first code contribution: small change to MPlayer. I completely forgot about it but as project was changing it’s license in 2017 I got an email about it.

    2004

    I bought my 3rd PDA (Sharp Zaurus SL-5500) and it was running Linux. I started building apps for it, hacking system to run better. Then cooperated with OpenZaurus distro developers and started contributing to OpenEmbedded build system. One day they gave me write access to repo and told to merge my changes.

    When I stopped using OE few years later I was the 5th on list of top contributors.

    I also count this year as first one of my FOSS career.

    2005

    Richard Jackson donated Zaurus c760 to me. As a gift for my OpenZaurus work. And then OPIE 1.2.2 release came due to my changes to make better use of VGA screen. I still have this device in running condition.

    2006

    Became release manager of OpenZaurus distribution, with team of users testing pre-release images. Released 3.5.4 (and later 3.5.4.1 and 3.5.4.2-rc) version.

    Started my own consulting company. Got some serious customers. End of work as PHP programmer.

    2007 - 2010

    I am doing what was hobby as full time job. Full FOSS work. Different companies, ARM architecture for 95% of time. Mostly consulting around OpenEmbedded.

    2010

    Due to my ARM foss involvement Canonical hired me. Started working at Linaro as software engineer. Cleaned cross compilers in Ubuntu/Debian, several other things.

    2012

    Became one of first AArch64 developers. Published OpenEmbedded support for it right after all toolchain patches became public.

    2013

    Left Linaro and Canonical, wrote about it on blog and in less then hour got “send me your CV” from Jon Masters from Red Hat. Joined company, did lot of changes in RHEL 7 and Fedora — mostly fixing FTBFS on !x86 architectures.

    2016

    My manager asked me do I want to go back to Linaro. This time as Red Hat assignee. Went, met old chaps, working mostly around OpenStack. Still on 64bit Arm.

    2017 - 2020

    Lot of work in OpenStack. Some work on Big Data stuff for other team at Linaro. Countless projects where I worked on getting stuff working on AArch64.

    Summary

    My whole career is built on FOSS.

    My x86(-64) desktop runs GNU/Linux since day one (September 2000) as main system. There was OpenDOS as second during studies due to some stuff.

    I had MS Windows XP as second system on one of laptops. But that’s due to some Arm hardware bringup tool being available only for this OS (later also for Linux). My family and friends learnt that I am unable to help them with MS Windows issues as I do not know that OS.

    Written by Marcin Juszkiewicz on
« Newer posts
Page 2 / 79
Older posts »