This post is part 3 of the "SBSA Reference Platform in QEMU" series:
- Versioning of sbsa-ref machine
- SBSA Reference Platform update
- Testing *BSD on SBSA Reference Platform
SystemReady specification mentions that system to be certified needs to be able to boot several operating systems:
In addition, OS installation and boot logs are required:
- Windows PE boot log, from a GPT partitioned disk, is required.
- VMware ESXi-Arm installation and boot logs are recommended.
- Installation and boot logs from two of the Linux distros or BSDs are required.
All logs must be submitted using the ES/SR template.
In choosing the Linux distros or BSDs, maximize the coverage by diversifying the heritage. For example, the following shows the grouping of the heritage:
- RHEL/Fedora/CentOS/AlmaLinux/Rocky Linux/Oracle Linux/Anolis OS
So during last week I went through *BSD ones.
Started with “download OpenBSD” page and found out that there is no installation ISO for aarch64 architecture. Not good.
So I fetched miniroot73.img disk image instead and went on with booting:
>> OpenBSD/arm64 BOOTAA64 1.16 boot> cannot open sd0a:/etc/random.seed: No such file or directory booting sd0a:/bsd: 2798224+1058776+12709688+630920 [229059+91+651336+254968]=0x1 3ce628 FACP DBG2 MCFG SPCR APIC SSDT PPTT GTDT BGRT Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2023 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 7.3 (RAMDISK) #1941: Sat Mar 25 14:42:22 MDT 2023 firstname.lastname@example.org:/usr/src/sys/arch/arm64/compile/RAMDISK real mem = 4287451136 (4088MB) avail mem = 4073807872 (3885MB) random: boothowto does not indicate good seed mainbus0 at root: ACPI psci0 at mainbus0: PSCI 1.1, SMCCC 1.2 cpu0 at mainbus0 mpidr 0: ARM Cortex-A57 r1p0 cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache cpu0: 2048KB 64b/line 16-way L2 cache cpu0: CRC32,SHA2,SHA1,AES+PMULL,ASID16 efi0 at mainbus0: UEFI 2.7 efi0: EFI Development Kit II / SbsaQemu rev 0x10000 smbios0 at efi0: SMBIOS 3.4.0 smbios0: vendor EFI Development Kit II / SbsaQemu version "1.0" date 09/15/2023 smbios0: QEMU QEMU SBSA-REF Machine agintc0 at mainbus0 shift 4:3 nirq 256 nredist 2: "interrupt-controller" agtimer0 at mainbus0: 62500 kHz acpi0 at mainbus0: ACPI 6.0 acpi0: tables DSDT FACP DBG2 MCFG SPCR APIC SSDT PPTT GTDT BGRT acpimcfg0 at acpi0 acpimcfg0: addr 0xf0000000, bus 0-255 pluart0 at acpi0 COM0 addr 0x60000000/0x1000 irq 33 pluart0: console ahci0 at acpi0 AHC0 addr 0x60100000/0x10000 irq 42: AHCI 1.0 ahci0: port 0: 1.5Gb/s ahci0: port 1: 1.5Gb/s scsibus0 at ahci0: 32 targets sd0 at scsibus0 targ 0 lun 0: <ATA, QEMU HARDDISK, 2.5+> t10.ATA_QEMU_HARDDISK_QM00001_ sd0: 43MB, 512 bytes/sector, 88064 sectors, thin sd1 at scsibus0 targ 1 lun 0: <ATA, QEMU HARDDISK, 2.5+> t10.ATA_QEMU_HARDDISK_QM00003_ sd1: 504MB, 512 bytes/sector, 1032192 sectors, thin ehci0 at acpi0 USB0 addr 0x60110000/0x10000 irq 43panic: uvm_fault failed: ffffff800034c3e8 esr 96000050 far ffffff8066ef5048 The operating system has halted. Please press any key to reboot.
As you can see it hang on an attempt to initialize USB controller. Which shows that our move from EHCI to XHCI was not properly tested ;(
The problem was that our virtual hardware (QEMU) had XHCI (USB 3) controller on non-discoverable platform bus. But firmware (EDK2) tells that it was EHCI (USB 2) one.
This got solved with Yuquan Wang’s patch moving EDK2 to initiate and describe XHCI usb controller (change is already merged upstream). After rebuilding EDK2 OpenBSD booted fine right to the installation prompt (skipped previous messages):
xhci0 at acpi0 USB0 addr 0x60110000/0x10000 irq 43, xHCI 0.0 usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev 3.00/1.00 addr 1 acpipci0 at acpi0 PCI0 pci0 at acpipci0 0:1:0: rom address conflict 0xfffc0000/0x40000 0:2:0: rom address conflict 0xffff8000/0x8000 "Red Hat Host" rev 0x00 at pci0 dev 0 function 0 not configured em0 at pci0 dev 1 function 0 "Intel 82574L" rev 0x00: msi, address 52:54:00:12:34:56 "Bochs VGA" rev 0x02 at pci0 dev 2 function 0 not configured "ACPI0007" at acpi0 not configured "ACPI0007" at acpi0 not configured simplefb0 at mainbus0: 1280x800, 32bpp wsdisplay0 at simplefb0 mux 1 wsdisplay0: screen 0 added (std, vt100 emulation) uhidev0 at uhub0 port 1 configuration 1 interface 0 "QEMU QEMU USB Keyboard" rev 2.00/0.00 addr 2 uhidev0: iclass 3/1 ukbd0 at uhidev0 wskbd0 at ukbd0 mux 1 wskbd0: connecting to wsdisplay0 uhidev1 at uhub0 port 2 configuration 1 interface 0 "QEMU QEMU USB Tablet" rev 2.00/0.00 addr 3 uhidev1: iclass 3/0 uhid at uhidev1 not configured softraid0 at root scsibus1 at softraid0: 256 targets root on rd0a swap on rd0b dump on rd0b WARNING: CHECK AND RESET THE DATE! erase ^?, werase ^W, kill ^U, intr ^C, status ^T Welcome to the OpenBSD/arm64 7.3 installation program. (I)nstall, (U)pgrade, (A)utoinstall or (S)hell?
After this I added booting OpenBSD to QEMU tests for SBSA Reference Platform to make sure that we have something non-Linux based there.
The next one was FreeBSD. And here situation started to be weird…
First I took 13.2 release. Used firmware with XHCI information and was greeted with:
Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC arm64 FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c) VT(efifb): resolution 1280x800 module firmware already present! real memory = 4294967296 (4096 MB) avail memory = 4160204800 (3967 MB) Starting CPU 1 (1) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled. random: entropy device external interface MAP 100fbdf0000 mode 2 pages 128 MAP 100fbe70000 mode 2 pages 160 MAP 100fbf10000 mode 2 pages 80 MAP 100fbfb0000 mode 2 pages 80 MAP 100ff500000 mode 2 pages 400 MAP 100ff690000 mode 2 pages 592 MAP 10000000 mode 0 pages 1728 MAP 60010000 mode 0 pages 1 kbd0 at kbdmux0 acpi0: <LINARO SBSAQEMU> acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) acpi0: Could not update all GPEs: AE_NOT_CONFIGURED psci0: <ARM Power State Co-ordination Interface Driver> on acpi0 gic0: <ARM Generic Interrupt Controller v3.0> iomem 0x40060000-0x4007ffff,0x40080000-0x4407ffff on acpi0 its0: <ARM GIC Interrupt Translation Service> mem 0x44081000-0x440a0fff on gic0 generic_timer0: <ARM Generic Timer> irq 3,4,5 on acpi0 Timecounter "ARM MPCore Timecounter" frequency 62500000 Hz quality 1000 Event timer "ARM MPCore Eventtimer" frequency 62500000 Hz quality 1000 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s uart0: <PrimeCell UART (PL011)> iomem 0x60000000-0x60000fff irq 0 on acpi0 uart0: console (115200,n,8,1) ahci0: <AHCI SATA controller> iomem 0x60100000-0x6010ffff irq 1 on acpi0 ahci0: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 xhci0: <Generic USB 3.0 controller> iomem 0x60110000-0x6011ffff irq 2 on acpi0 xhci0: 32 bytes context size, 32-bit DMA
And it hang there…
Let check newer FreeBSD
Contacted people on #freebsd IRC channel and Mina Galić (meena on IRC) asked me to boot FreeBSD 14 or 15 images. So I tried both:
xhci0: <Generic USB 3.0 controller> iomem 0x60110000-0x6011ffff irq 2 on acpi0 xhci0: 32 bytes context size, 64-bit DMA usbus0 on xhci0
System booted further. Note “64-bit DMA” information instead of “32-bit DMA” from 13.2 release. Reported bug 274237 for it. On the same day required change was identified and marked for potential backport.
But that was not the only problem. Turned out that none of AHCI devices were found… So no way to run an installer:
ahci0: <AHCI SATA controller> iomem 0x60100000-0x6010ffff irq 1 on acpi0 ahci0: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 [..] Release APs...done Trying to mount root from cd9660:/dev/iso9660/13_2_RELEASE_AARCH64_BO [ro]... Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM ahcich0: Poll timeout on slot 1 port 0 ahcich0: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 170 serr 00000000 cmd 0000c017 (aprobe0:ahcich0:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
I checked QEMU 7.2 (from Fedora package) and it booted fine. 8.0.5 failed, 8.0.0 booted. Hm… Started “git bisect” to find out which change broke it. After several rebuilds I found commit to blame:
commit 7bcd32128b227cee1fb39ff242d486ed9fff7648 Author: Niklas Cassel <email@example.com> Date: Fri Jun 9 16:08:40 2023 +0200 hw/ide/ahci: simplify and document PxCI handling The AHCI spec states that: For NCQ, PxCI is cleared on command queued successfully.
Is it AArch64 only or not?
The next step: checking is it global problem or only aarch64 one.
I built x86-64 emulation component and checked Q35 machine (which also uses AHCI). And FreeBSD failed exactly same way. This made bug reporting a lot easier as there were several architectures and more users affected.
Mailed author and QEMU developers about it. Described the problem, gave exact command line arguments for QEMU etc. Niklas Cassel replied:
I will have a look at this.
So it will be done.
Here situation was a bit similar to FreeBSD one.
Fetched NetBSD 9.3 image and booted just to see it hang (removed printk.time from output):
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 9.3 (GENERIC64) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 total memory = 4075 MB avail memory = 3929 MB running cgd selftest aes-xts-256 aes-xts-512 done armfdt0 (root) simplebus0 at armfdt0: QEMU QEMU SBSA-REF Machine simplebus1 at simplebus0 acpifdt0 at simplebus0 acpifdt0: using EFI runtime services for RTC ACPI: RSDP 0x00000100FC020018 000024 (v02 LINARO) ACPI: XSDT 0x00000100FC02FE98 00006C (v01 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: FACP 0x00000100FC02FB98 000114 (v06 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: DSDT 0x00000100FC02E998 000CD8 (v02 LINARO SBSAQEMU 20200810 INTL 20220331) ACPI: DBG2 0x00000100FC02FA98 00005C (v00 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: MCFG 0x00000100FC02FE18 00003C (v01 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: SPCR 0x00000100FC02FF98 000050 (v02 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: IORT 0x00000100FC027518 0000DC (v00 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: APIC 0x00000100FC02E498 000108 (v04 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: SSDT 0x00000100FC02E898 000067 (v02 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: PPTT 0x00000100FC02FD18 0000B8 (v02 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: GTDT 0x00000100FC02E618 000084 (v03 LINARO SBSAQEMU 20200810 LNRO 00000001) ACPI: 2 ACPI AML tables successfully acquired and loaded acpi0 at acpifdt0: Intel ACPICA 20190405 cpu0 at acpi0: unknown CPU (ID = 0x411fd402) cpu0: package 0, core 0, smt 0 cpu0: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled cpu0: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B cpu0: Dcache line 64, Icache line 64 cpu0: L1 0KB/64B 4-way PIPT Instruction cache cpu0: L1 0KB/64B 4-way PIPT Data cache cpu0: L2 0KB/64B 8-way PIPT Unified cache cpu0: revID=0x0, 4k table, 16k table, 64k table, 16bit ASID cpu0: auxID=0x1011111110212120, GICv3, CRC32, SHA1, AES+PMULL, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add cpu1 at acpi0: unknown CPU (ID = 0x411fd402) cpu1: package 0, core 1, smt 0 gicvthree0 at acpi0: GICv3 gicvthree0: ITS #0 at 0x44081000 gicvthree0: ITS [#0] Devices table @ 0x10009210000/0x80000, Cacheable WA WB, Inner shareable gicvthree0: ITS [#1] Collections table @ 0x10009290000/0x10000, Cacheable WA WB, Inner shareable
As 9.3 release is quite old I tested NetBSD 10-Beta:
gicvthree0 at acpi0: GICv3 gicvthree0: ITS #0 at 0x44081000 gicvthree0: ITS [#0] Devices table @ 0x10008a60000/0x80000, Cacheable WA WB, Inner shareable gicvthree0: ITS [#1] Collections table @ 0x10008ae0000/0x10000, Cacheable WA WB, Inner shareable gtmr0 at acpi0: irq 27 armgtmr0 at gtmr0: Generic Timer (62500 kHz, virtual) plcom0 at acpi0 (COM0, ARMH0011-0): mem 0x60000000-0x60000fff irq 33 plcom0: console [..] NetBSD-10.0_BETA Install System
Went to #netbsd channel on IRC and started disussion. Michael van Elst (mlelstv on irc) gave me a helping hand and debugged the problem. Looks like kernel went into infinite loop on parsing GTDT table from ACPI. Newer branches of NetBSD have additional check there.
Filled bug 57642 for it. And, like in FreeBSD case, it looks like some backport to stable branch is needed.
Testing platforms for SystemReady compliance needs to include *BSD systems. Linux and NetBSD were fine with our USB controller mess — gave “something is wrong” message and went on. FreeBSD and OpenBSD systems were complaining and stopped booting process.
We also need to do more testing before merging big changes in future. This USB controller mess could be avoided or done better.