[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Slow HVM boot time, was "HVM boot time optimization"

On Fri, 16 Feb 2018 09:05:02 +0100
Yessine Daoud <da.yessine@xxxxxxxxx> wrote:

>Please find attached the requested log file.

According to the log, string I/O is actually passed from IOREQ buffered
-- in groups of 4096 I/O read ops, but they're still emulated one by
one, calling QEMU's fw_cfg emulation for every I/O byte -- that's the
reason of slow loading.

In order to speed up fw_cfg reading, I/O interface with fw_cfg should
be somehow replaced with a DMA one (fw_cfg_init_io_dma). SeaBIOS have
support for reading fw_cfg via emulated DMA, so switching to the
DMA-version of fw_cfg will allow to pass kernel files faster.

Basically, we need to replace the following line in xen_load_linux():
    fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,

But this step might (at least) require few additional adjustments for
IOREQ_TYPE_COPY handling in xen-hvm.c -- looks like right now it's the
same 'for every single data item' loop like for buffered I/O processing.
However, unlike I/O processing this can be modified to feed
cpu_physical_memory_rw() with larger chunks of data thus reducing the
number of emulator calls.

>2018-02-16 3:08 GMT+01:00 Alexey G <x1917x@xxxxxxxxx>:
>> On Thu, 15 Feb 2018 17:02:35 +0100
>> Yessine Daoud <da.yessine@xxxxxxxxx> wrote:
>> > Hello,
>> >
>> >I tried to debug the issue and this what I found:
>> >the HVM boot takes some time at the following section
>> >(qemu/pc-bios/optionrom/linuxboot.S)
>> >/* Load kernel and initrd */
>> >read_fw_blob_addr32_edi(FW_CFG_INITRD) (ramdisk about 3M takes
>> >~~7.s) read_fw_blob_addr32(FW_CFG_KERNEL) (vmlinuz about 7M takes
>> >~~15.s) read_fw_blob_addr32(FW_CFG_CMDLINE)
>> >
>> >#define read_fw_blob_addr32(var) \
>> >read_fw var ## _ADDR; \
>> >mov %eax, %edi; \
>> >read_fw_blob_pre(var); \
>> >/* old as(1) doesn't like this insn so emit the bytes instead: \
>> >addr32 rep insb (%dx), %es:(%edi); \
>> >*/ \
>> >.dc.b 0x67,0xf3,0x6c
>> >
>> >#define read_fw_blob_addr32_edi(var) \
>> >read_fw_blob_pre(var); \
>> >/* old as(1) doesn't like this insn so emit the bytes instead: \
>> >addr32 rep insb (%dx), %es:(%edi); \
>> >*/ \
>> >.dc.b 0x67,0xf3,0x6c
>> >
>> >Any idea how to speed the  I/O read ?
>> >Thanks.  
>> Hmm, looks like it does rep insb with every I/O iteration emulated
>> individually for some reason, hence its so slow. Normally it should
>> be emulated on a buffer basis. There might be a bug somewhere which
>> cause string I/O to be handled by every iteration.
>> You may try to collect QEMU trace logs using
>> device_model_args = ["-trace", "events=<path to your events file>"]
>> Where the events file should contain lines like this:
>> xen_ioreq_server_create
>> xen_ioreq_server_destroy
>> xen_ioreq_server_state
>> xen_map_portio_range
>> xen_unmap_portio_range
>> cpu_ioreq_pio
>> cpu_ioreq_pio_read_reg
>> cpu_ioreq_pio_write_reg
>> handle_ioreq
>> handle_ioreq_read
>> handle_ioreq_write
>> The resulting log file in /var/log/xen might be large (may even
>> require to specify XEN_QEMU_CONSOLE_LIMIT=0) but will show how the
>> string I/O with port 510h is processed. This should narrow the issue.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.