[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Kvmtool] Some thoughts on using kvmtool Virtio for Xen
 
- To: Wei Chen <Wei.Chen@xxxxxxx>
 
- From: Oleksandr <olekstysh@xxxxxxxxx>
 
- Date: Tue, 6 Jul 2021 15:07:18 +0300
 
- Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, "will@xxxxxxxxxx" <will@xxxxxxxxxx>, "julien.thierry.kdev@xxxxxxxxx" <julien.thierry.kdev@xxxxxxxxx>, "kvm@xxxxxxxxxxxxxxx" <kvm@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "jean-philippe@xxxxxxxxxx" <jean-philippe@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Andre Przywara <Andre.Przywara@xxxxxxx>, Marc Zyngier <maz@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>
 
- Delivery-date: Tue, 06 Jul 2021 12:07:25 +0000
 
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
 
 
 
Hello Wei,
Sorry for the late response.
And thanks for working in that direction and preparing the document.
On 05.07.21 13:02, Wei Chen wrote:
 
Hi Stefano,
Thanks for your comments.
 
-----Original Message-----
From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Sent: 2021年6月30日 8:43
To: will@xxxxxxxxxx; julien.thierry.kdev@xxxxxxxxx; Wei Chen
<Wei.Chen@xxxxxxx>
Cc: kvm@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxx; jean-philippe@xxxxxxxxxx;
Julien Grall <julien@xxxxxxx>; Andre Przywara <Andre.Przywara@xxxxxxx>;
Marc Zyngier <maz@xxxxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>
Subject: Re: [Kvmtool] Some thoughts on using kvmtool Virtio for Xen
Hi Wei,
Sorry for the late reply.
On Tue, 15 Jun 2021, Wei Chen wrote:
 
Hi,
I have some thoughts of using kvmtool Virtio implementation
for Xen. I copied my markdown file to this email. If you have
time, could you please help me review it?
Any feedback is welcome!
# Some thoughts on using kvmtool Virtio for Xen
## Background
Xen community is working on adding VIRTIO capability to Xen. And we're
 
 
working
 
on VIRTIO backend of Xen. But except QEMU can support virtio-net for
 
 
x86-xen,
 
there is not any VIRTIO backend can support Xen. Because of the
 
 
community's
 
strong voice of Out-of-QEMU, we want to find a light weight VIRTIO
 
 
backend to
 
support Xen.
 
 
 
 
 Yes, having something light weight to provide Virtio backends for the at 
least *main* devices (console, blk, net)
which we could run on Xen without an extra effort would be really nice.
 
We have an idea of utilizing the virtio implementaton of kvmtool for Xen.
 
 
And
 
We know there was some agreement that kvmtool won't try to be a full
 
 
QEMU
 
alternative. So we have written two proposals in following content for
communities to discuss in public:
## Proposals
### 1. Introduce a new "dm-only" command
1. Introduce a new "dm-only" command to provide a pure device model mode.
 
 
In
 
    this mode, kvmtool only handles IO request. VM creation and
 
initialization
 
    will be bypassed.
     * We will rework the interface between the virtio code and the rest
 
of
 
     kvmtool, to use just the minimal set of information. At the end,
 
there
 
     would be MMIO accesses and shared memory that control the device
 
model,
 
     so that could be abstracted to do away with any KVM specifics at all.
 
If
 
     this is workable, we will send the first set of patches to introduce
 
this
 
     interface, and adapt the existing kvmtool to it. Then later we will
 
can
 
     add Xen support on top of it.
     About Xen support, we will detect the presence of Xen libraries,
 
also
 
     allow people to ignore them, as kvmtoll do with optional features
 
like
 
     libz or libaio.
     Idealy, we want to move all code replying on Xen libraries to a set
 
of
 
     new files. In this case, thes files can only be compiled when Xen
     libraries are detected. But if we can't decouple this code
 
completely,
 
     we may introduce a bit of #ifdefs to protect this code.
     If kvm or other VMM do not need "dm-only" mode. Or "dm-only" can not
     work without Xen libraries. We will make "dm-only" command depends
 
on
 
     the presence of Xen libraries.
     So a normal compile (without the Xen libraries installed) would
 
create
 
     a binary as close as possible to the current code, and only the
 
people
 
     who having Xen libraries installed would ever generate a "dm-only"
     capable kvmtool.
### 2. Abstract kvmtool virtio implementation as a library
1. Add a kvmtool Makefile target to generate a virtio library. In this
    scenario, not just Xen, but any project else want to provide a
    userspace virtio backend service can link to this virtio libraris.
    These users would benefit from the VIRTIO implementation of kvmtool
    and will participate in improvements, upgrades, and maintenance of
    the VIRTIO libraries.
     * In this case, Xen part code will not upstream to kvmtool repo,
       it would then be natural parts of the xen repo, in xen/tools or
       maintained in other repo.
       We will have a completely separate VIRTIO backend for Xen, just
       linking to kvmtool's VIRTIO library.
     * The main changes of kvmtool would be:
         1. Still need to rework the interface between the virtio code
            and the rest of kvmtool, to abstract the whole virtio
            implementation into a library
         2. Modify current build system to add a new virtio library
 
target.
I don't really have a preference between the two.
 From my past experience with Xen enablement in QEMU, I can say that the
Xen part of receiving IO emulation requests is actually pretty minimal.
 
 
 
 In general, both proposals sound good to me, probably with a little 
preference for #1, but I am not sure that I can see all pitfalls here.
 
Yes, we have done some prototyping, and the code of Xen receive IOREQ
support can be implemented in a separate new file without invasion into
the existing kvmtool.
The point is that the device implementation calls the hypervisor interfaces
to handle these IOREQs, and is currently tightly coupled to Linux-KVM in the
implementation of each device. Without some abstract work, these adaptations
can lead to more intrusive modifications.
 
See as a reference
https://github.com/qemu/qemu/blob/13d5f87cc3b94bfccc501142df4a7b12fee3a6e7
/hw/i386/xen/xen-hvm.c#L1163.
The modifications to rework the internal interfaces that you listed
below are far more "interesting" than the code necessary to receive
emulation requests from Xen.
 
 
 
+1
 
 
 
I'm glad to hear that : )
 
So it looks like option-1 would be less efforts and fewer code changes
overall to kvmtools. Option-2 is more work. The library could be nice to
have but then we would have to be very careful about the API/ABI,
compatibility, etc.
Will Deacon and Julien Thierry might have an opinion.
 
 
Looking forward to Will and Julien's comments.
 
## Reworking the interface is the common work for above proposals
**In kvmtool, one virtual device can be separated into three layers:**
- A device type layer to provide an abstract
     - Provide interface to collect and store device configuration.
         Using block device as an example, kvmtool is using disk_image to
         -  collect and store disk parameters like:
             -  backend image format: raw, qcow or block device
             -  backend block device or file image path
             -  Readonly, direct and etc
     - Provide operations to interact with real backend devices or
 
services:
 
         - provide backend device operations:
             - block device operations
             - raw image operations
             - qcow image operations
- Hypervisor interfaces
     - Guest memory mapping and unmapping interfaces
     - Virtual device register interface
         - MMIO/PIO space register
         - IRQ register
     - Virtual IRQ inject interface
     - Hypervisor eventfd interface
 
The "hypervisor interfaces" are the ones that are most interesting as we
need an alternative implementation for Xen for each of them. This is
the part that was a bit more delicate when we added Xen support to QEMU.
Especially the memory mapping and unmapping. All doable but we need
proper abstractions.
 
 
Yes. Guest memory mapping and unmapping, if we use option#1, this will be a
a big change introduced in Kvmtool. Since Linux-KVM guest memory in kvmtool
is flat mapped in advance, it does not require dynamic Guest memory mapping
and unmapping. A proper abstract interface can bridge this gap.
 
 
 The layer separation scheme looks reasonable to me at first sight. 
Agree, "Hypervisor interfaces" worry the most, especially "Guest memory 
mapping and unmapping" which is something completely different on Xen in 
comparison with Kvm. If I am not mistaken, in the PoC the Virtio ring(s) 
are mapped at once during device initialization and unmapped during 
releasing it, while the payloads I/O buffers are mapped/unmapped at 
run-time ...
If only we could map all memory in advance and just calculate virt addr 
at run-time like it was done for Kvm case in guest_flat_to_host(). What 
we would just need is to re-map memory once the guest memory layout is 
changed
(fortunately, we have invalidate mapcache request to signal about that).
 FYI, I had a discussion with Julien on IRC regarding foreign memory 
mappings and possible improvements, the main problem today is that we 
need to steal page from the backend domain memory in order to map guest 
page into backend address space, so if we decide to map all memory in 
advance and need to serve guest(s) with a lot of memory we may run out 
of memory in the host very quickly (see XSA-300). So the idea is to try 
to map guest memory into some unused address space provided by the 
hypervisor and then hot-plugged without charging real domain pages 
(everything not mapped into P2M could be theoretically treated as 
unused). I have already started investigations, but unfortunately had to 
postpone them due to project related activities, definitely I have a 
plan to resume them again and create a PoC at least. This would simplify 
things, improve performance and eliminate the memory pressure in the host.
 
 
- An implementation layer to handle guest IO request.
     - Kvmtool provides virtual devices for guest. Some virtual devices
 
two
 
       kinds of implementations:
         - VIRTIO implementation
         - Real hardware emulation
For example, kvmtool console has virtio console and 8250 serial two
 
kinds
 
of implementations. These implementation depends on device type
 
 
parameters
 
to create devices, and depends on device type ops to forward data
 
 
from/to
 
real device. And the implementation will invoke hypervisor interfaces to
map/unmap resources and notify guest.
In the current kvmtool code, the boundaries between these three layers
 
 
are
 
relatively clear, but there are a few pieces of code that are somewhat
interleaved, for example:
- In virtio_blk__init(...) function, the code will use disk_image
 
 
directly.
 
   This data is kvmtool specified. If we want to make VIRTIO
 
 
implementation
 
   become hypervisor agnostic. Such kind of code should be moved to other
   place. Or we just keep code from virtio_blk__init_one(...) in virtio
 
 
block
 
   implementation, but keep virtio_blk__init(...) in kvmtool specified
 
 
part
 
   code.
However, in the current VIRTIO device creation and data handling process,
the device type and hypervisor API used are both exclusive to kvmtool
 
 
and
 
KVM. If we want to use current VIRTIO implementation for other device
models and hypervisors, it is unlikely to work properly.
So, the major work of reworking interface is decoupling VIRTIO
 
 
implementation
 
from kvmtool and KVM.
**Introduce some intermediate data structures to do decouple:**
1. Introduce intermedidate type data structures like `virtio_disk_type`,
    `virtio_net_type`, `virtio_console_type` and etc. These data
 
structures
 
    will be the standard device type interfaces between virtio device
    implementation and hypervisor.  Using virtio_disk_type as an example:
     ~~~~
     struct virtio_disk_type {
         /*
          * Essential configuration for virtio block device can be got
 
from
 
          * kvmtool disk_image. Other hypervisor device model also can
 
use
 
          * this data structure to pass necessary parameters for creating
          * a virtio block device.
          */
         struct virtio_blk_cfg vblk_cfg;
         /*
          * Virtio block device MMIO address and IRQ line. These two
 
members
 
          * are optional. If hypervisor provides allocate_mmio_space and
          * allocate_irq_line capability and device model doesn't set
 
these
 
          * two fields, virtio block implementation will use hypervisor
 
APIs
 
          * to allocate MMIO address and IRQ line. If these two fields
 
are
 
          * configured, virtio block implementation will use them.
          */
         paddr_t addr;
         uint32_t irq;
         /*
          * In kvmtool, this ops will connect to disk_image APIs. Other
          * hypervisor device model should provide similar APIs for this
          * ops to interact with real backend device.
          */
         struct disk_type_ops {
             .read
             .write
             .flush
             .wait
             ...
         } ops;
     };
     ~~~~
2. Introduce a intermediate hypervisor data structure. This data
 
structure
 
    provides a set of standard hypervisor API interfaces. In virtio
    implementation, the KVM specified APIs, like kvm_register_mmio, will
 
not
 
    be invoked directly. The virtio implementation will use these
 
interfaces
 
    to access hypervisor specified APIs. for example `struct vmm_impl`:
     ~~~~
     struct vmm_impl {
         /*
          * Pointer that link to real hypervisor handle like `struct kvm
 
*kvm`.
 
          * This pointer will be passed to the vmm ops;
          */
         void *vmm;
         allocate_irq_line_fn_t(void* vmm, ...);
         allocate_mmio_space_fn_t(void* vmm, ...);
         register_mmio_fn_t(void* vmm, ...);
         map_guest_page_fn_t(void* vmm, ...);
         unmap_guest_page_fn_t(void* vmm, ...);
         virtual_irq_inject_fn_t(void* vmm, ...);
     };
     ~~~~
 
Are the map_guest_page and unmap_guest_page functions already called at
the appropriate places for KVM?
 
 
As I had mentioned in above, KVM doesn't need map_guest_page and 
unmap_guest_page
dynamically while handling the IOREQ. These two interfaces can be pointed to 
NULL
or empty functions for KVM.
 
If not, the main issue is going to be adding the
map_guest_page/unmap_guest_page calls to the virtio device
implementations.
 
 
Yes, we can place them to virtio device implementations, and keep NOP
operation for KVM. Other VMMs can be implemented as the case may be
 
3. After decoupled with kvmtool, any hypervisor can use standard
 
 
`vmm_impl`
 
    and `virtio_xxxx_type` interfaces to invoke standard virtio
 
implementation
 
    interfaces to create virtio devices.
     ~~~~
     /* Prepare VMM interface */
     struct vmm_impl *vmm = ...;
     vmm->register_mmio_fn_t = kvm__register_mmio;
     /* kvm__map_guset_page is a wrapper guest_flat_to_host */
     vmm->map_guest_page_fn_t = kvm__map_guset_page;
     ...
     /* Prepare virtio_disk_type */
     struct virtio_disk_type *vdisk_type = ...;
     vdisk_type->vblk_cfg.capacity = disk_image->size / SECTOR_SIZE;
     ...
     vdisk_type->ops->read = disk_image__read;
     vdisk_type->ops->write = disk_image__write;
     ...
     /* Invoke VIRTIO implementation API to create a virtio block device
 
*/
 
     virtio_blk__init_one(vmm, vdisk_type);
     ~~~~
VIRTIO block device simple flow before reworking interface:
 
https://drive.google.com/file/d/1k0Grd4RSuCmhKUPktHj9FRamEYrPCFkX/view?usp
=sharing
 

 
VIRTIO block device simple flow after reworking interface:
 
 
https://drive.google.com/file/d/1rMXRvulwlRO39juWf08Wgk3G1NZtG2nL/view?usp
=sharing
 

 
 
 
Could you please provide an access for these documents if possible?
 
Thanks,
Wei Chen
IMPORTANT NOTICE: The contents of this email and any attachments are
 
 
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
 
 
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
 
 
--
Regards,
Oleksandr Tyshchenko
 
 
    
     |