[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [arm] Dom0 hangs after enable KROBE_EVENTS and/or UPROBE_EVENTS in kernel config



Hi Stefano,

On 23/07/2021 21:14, Stefano Stabellini wrote:
On Fri, 23 Jul 2021, Julien Grall wrote:
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) invalid compressed format (err=1)
(XEN) ****************************************

This implies Xen think the kernel module was a GZIP image and Xen is trying to
decompress it. However, from your e-mail above the name of the kernel module
is xen-Image-5.10 which implies this is not a compressed image.

Can you confirm what is the format of xen-Image-5.10?

gzip (arch/arm64/boot/Image.gz)

Ok. I tried to use a compressed Image with bootwrapper on the foundation model but I saw no issue (see more below).

[...]

I have a Xilinx board at home (I haven't used it recently though), so I am
happy to help debugging it. Alternatively, do you know if it reproduces on the
Xilinx QEMU?

Yeah I am using the same Xilinx board that you have.

I also managed to repro the issue on upstream QEMU 2.11 (it might happen
with newer versions but all my scripts and dtbs were already based on
2.11) and upstream U-Boot 2021.04-rc1-00009-gfdcb93e170:


/local/arm-vm/qemu-system-aarch64 \
     -machine virt,gic_version=3 \
     -machine virtualization=true \
     -cpu cortex-a57 -machine type=virt \
     -smp 4 -m 4096 \
     -serial mon:stdio \
     -bios /local/arm-vm/u-boot.bin \
     -device 
loader,file=/var/lib/tftpboot/2021.1/xen,force-raw=on,addr=0x40C01000 \
     -device 
loader,file=/var/lib/tftpboot/2021.1/xen-Image-5.10,force-raw=on,addr=0x40D0A000
 \
     -device 
loader,file=/var/lib/tftpboot/2021.1/initrd.cpio,force-raw=on,addr=0x418F1000 \
     -device 
loader,file=/local/arm-vm/virt-gicv3-2.dtb,force-raw=on,addr=0x41A75000 \
     -device loader,file=/local/arm-vm/boot.scr,force-raw=on,addr=0x40C00000


With the following boot.scr:

fdt addr 0x41A75000
fdt resize 1024
fdt set /chosen \#address-cells <0x2>
fdt set /chosen \#size-cells <0x2>
fdt set /chosen xen,xen-bootargs "console=dtuart dtuart=serial0 dom0_mem=2G 
dom0_max_vcpus=2 bootscrub=0 vwfi=native sched=null"
fdt mknod /chosen dom0
fdt set /chosen/dom0 compatible "xen,linux-zimage" "xen,multiboot-module" 
"multiboot,module"
fdt set /chosen/dom0 reg <0x0 0xD0A000 0x0 0xbe6b8a>
fdt set /chosen xen,dom0-bootargs "console=hvc0 earlycon=xen earlyprintk=xen 
clk_ignore_unused root=/dev/ram0"
fdt mknod /chosen dom0-ramdisk
fdt set /chosen/dom0-ramdisk compatible "xen,linux-initrd" "xen,multiboot-module" 
"multiboot,module"
fdt set /chosen/dom0-ramdisk reg <0x0 0x18F1000 0x0 0x183400>
setenv fdt_high 0xffffffffffffffff
booti 0x40C01000 - 0x41A75000

I get the error:

(XEN) Panic on CPU 0:
(XEN) invalid compressed format (err=1)

Thanks for the runes. I managed to reproduce it with a recent QEMU. Comparing with a working setup on the Foundation model, I noticed that some of the byte in memory were different towards the end of the binary.

I have used gdb to watch the memory changed and it stopped in the middle of U-boot. In fact the log from U-boot has:

=> booti 0x40C01000 - 0x41A75000
Moving Image from 0x40c01000 to 0x40e00000, end=40f5a8f8
## Flattened Device Tree blob at 41a75000
   Booting using the fdt blob at 0x41a75000
   Using Device Tree in place at 0000000041a75000, end 0000000041a7afff

The second line shows that U-boot relocated Xen in middle of the kernel Image.

Looking at the U-boot code, it contains:

        /*
         * If bit 3 of the flags field is set, the 2MB aligned base of the
         * kernel image can be anywhere in physical memory, so respect
         * images->ep.  Otherwise, relocate the image to the base of RAM
         * since memory below it is not accessible via the linear mapping.
         */
        if (!force_reloc && (le64_to_cpu(ih->flags) & BIT(3)))
                dst = image - text_offset;
        else
                dst = gd->bd->bi_dram[0].start;

        *relocated_addr = ALIGN(dst, SZ_2M) + text_offset;

This will force the kernel to be at 2MB aligned around the address it where loaded (if bit 3 is set) or at the start of the RAM.

Looking again at the Image protocol, they indeed have a requirement to be loaded at a 2MB aligned base address. So I was wrong about the alignment :/. Apologies, I should have check Documentation/arm64/booting.rst rather than relying solely on our changelog.

TBH, I think this is a bit naughty for U-boot to overwrite some modules. But I guess it doesn't know them at least when already loaded in the memory (not sure for tftp).

Now, I wonder why the following commit introduced the 4KB alignment:

commit ca59618967fe0c3ecc6cb7bd8bd0f5651b4e9dea
Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
Date:   Mon Jul 21 13:59:56 2014 +0100

    xen: arm: Handle 4K aligned hypervisor load address.

Currently the boot page tables map Xen at XEN_VIRT_START using a 2MB section mapping. This means that the bootloader must load Xen at a 2MB aligned address. Unfortunately this is not the case with UEFI on the Juno platform where Xen fails to boot. Furthermore the Linux boot protocol (which Xen claims to adhere to) does not have this restriction, therefore this is our bug and not the
    bootloader's.

Fix this by adding third level pagetables to the boot time pagetables, allowing us to map a Xen which is aligned only to a 4K boundary. This only affects the boot time page tables since Xen will later relocate itself to a 2MB aligned address. Strictly speaking the non-boot processors could make use of this and use a section mapping, but it is simpler if all processors follow the same boot
    path.

Strictly speaking the Linux boot protocol doesn't even require 4K alignment (and apparently Linux can cope with this), but so far all bootloaders appear to
    provide it, so support for this is left for another day.

In order to use LPAE_ENTRIES in head.S we need to define it in an asm friendly
    way.

    Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
    Acked-by: Julien Grall <julien.grall@xxxxxxxxxx>
    [ ijc -- properly format message "- FOO -\r\n" ]

IIRC, Juno (both r1 and r2) are Armv8 processor (supporting both 32-bit and 64-bit). From the commit message it is not entirely clear whether the issue was found on 64-bit or 32-bit.

I am tempted to force the 2MB alignment on Arm64 again (AFAICT zImage doesn't require a 2MB alignment) because the assembly code should be shorter. I would need to check what alignment UEFI requires first.

Anyway, that was a fun issue to debug. For Xen, we may want to consider to check overlapping of the modules in very early boot. This would help diagnosing such issues.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.