[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Enabling hypervisor agnosticism for VirtIO backends



Hi Christopher,

Thank you for your feedback.

On Mon, Aug 30, 2021 at 12:53:00PM -0700, Christopher Clark wrote:
> [ resending message to ensure delivery to the CCd mailing lists
> post-subscription ]
> 
> Apologies for being late to this thread, but I hope to be able to
> contribute to
> this discussion in a meaningful way. I am grateful for the level of
> interest in
> this topic. I would like to draw your attention to Argo as a suitable
> technology for development of VirtIO's hypervisor-agnostic interfaces.
> 
> * Argo is an interdomain communication mechanism in Xen (on x86 and Arm)
> that
>   can send and receive hypervisor-mediated notifications and messages
> between
>   domains (VMs). [1] The hypervisor can enforce Mandatory Access Control
> over
>   all communication between domains. It is derived from the earlier v4v,
> which
>   has been deployed on millions of machines with the HP/Bromium uXen
> hypervisor
>   and with OpenXT.
> 
> * Argo has a simple interface with a small number of operations that was
>   designed for ease of integration into OS primitives on both Linux
> (sockets)
>   and Windows (ReadFile/WriteFile) [2].
>     - A unikernel example of using it has also been developed for XTF. [3]
> 
> * There has been recent discussion and support in the Xen community for
> making
>   revisions to the Argo interface to make it hypervisor-agnostic, and
> support
>   implementations of Argo on other hypervisors. This will enable a single
>   interface for an OS kernel binary to use for inter-VM communication that
> will
>   work on multiple hypervisors -- this applies equally to both backends and
>   frontend implementations. [4]

Regarding virtio-over-Argo, let me ask a few questions:
(In figure "Virtual device buffer access:Virtio+Argo" in [4])
1) How the configuration is managed?
   On either virtio-mmio or virtio-pci, there always takes place
   some negotiation between the FE and BE through the "configuration"
   space. How can this be done in virtio-over-Argo?
2) Do there physically exist virtio's available/used vrings as well as
   descriptors, or are they virtually emulated over Argo (rings)?
3) The payload in a request will be copied into the receiver's Argo ring.
   What does the address in a descriptor mean?
   Address/offset in a ring buffer?
4) Estimate of performance or latency?
   It appears that, on FE side, at least three hypervisor calls (and data
   copying) need to be invoked at every request, right?

Thanks,
-Takahiro Akashi


> * Here are the design documents for building VirtIO-over-Argo, to support a
>   hypervisor-agnostic frontend VirtIO transport driver using Argo.
> 
> The Development Plan to build VirtIO virtual device support over Argo
> transport:
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1
> 
> A design for using VirtIO over Argo, describing how VirtIO data structures
> and communication is handled over the Argo transport:
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1348763698/VirtIO+Argo
> 
> Diagram (from the above document) showing how VirtIO rings are synchronized
> between domains without using shared memory:
> https://openxt.atlassian.net/46e1c93b-2b87-4cb2-951e-abd4377a1194#media-blob-url=true&id=01f7d0e1-7686-4f0b-88e1-457c1d30df40&collection=contentId-1348763698&contextId=1348763698&mimeType=image%2Fpng&name=device-buffer-access-virtio-argo.png&size=243175&width=1106&height=1241
> 
> Please note that the above design documents show that the existing VirtIO
> device drivers, and both vring and virtqueue data structures can be
> preserved
> while interdomain communication can be performed with no shared memory
> required
> for most drivers; (the exceptions where further design is required are those
> such as virtual framebuffer devices where shared memory regions are
> intentionally
> added to the communication structure beyond the vrings and virtqueues).
> 
> An analysis of VirtIO and Argo, informing the design:
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO
> 
> * Argo can be used for a communication path for configuration between the
> backend
>   and the toolstack, avoiding the need for a dependency on XenStore, which
> is an
>   advantage for any hypervisor-agnostic design. It is also amenable to a
> notification
>   mechanism that is not based on Xen event channels.
> 
> * Argo does not use or require shared memory between VMs and provides an
> alternative
>   to the use of foreign shared memory mappings. It avoids some of the
> complexities
>   involved with using grants (eg. XSA-300).
> 
> * Argo supports Mandatory Access Control by the hypervisor, satisfying a
> common
>   certification requirement.
> 
> * The Argo headers are BSD-licensed and the Xen hypervisor implementation
> is GPLv2 but
>   accessible via the hypercall interface. The licensing should not present
> an obstacle
>   to adoption of Argo in guest software or implementation by other
> hypervisors.
> 
> * Since the interface that Argo presents to a guest VM is similar to DMA, a
> VirtIO-Argo
>   frontend transport driver should be able to operate with a physical
> VirtIO-enabled
>   smart-NIC if the toolstack and an Argo-aware backend provide support.
> 
> The next Xen Community Call is next week and I would be happy to answer
> questions
> about Argo and on this topic. I will also be following this thread.
> 
> Christopher
> (Argo maintainer, Xen Community)
> 
> --------------------------------------------------------------------------------
> [1]
> An introduction to Argo:
> https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf
> https://www.youtube.com/watch?v=cnC0Tg3jqJQ
> Xen Wiki page for Argo:
> https://wiki.xenproject.org/wiki/Argo:_Hypervisor-Mediated_Exchange_(HMX)_for_Xen
> 
> [2]
> OpenXT Linux Argo driver and userspace library:
> https://github.com/openxt/linux-xen-argo
> 
> Windows V4V at OpenXT wiki:
> https://openxt.atlassian.net/wiki/spaces/DC/pages/14844007/V4V
> Windows v4v driver source:
> https://github.com/OpenXT/xc-windows/tree/master/xenv4v
> 
> HP/Bromium uXen V4V driver:
> https://github.com/uxen-virt/uxen/tree/ascara/windows/uxenv4vlib
> 
> [3]
> v2 of the Argo test unikernel for XTF:
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg02234.html
> 
> [4]
> Argo HMX Transport for VirtIO meeting minutes:
> https://lists.xenproject.org/archives/html/xen-devel/2021-02/msg01422.html
> 
> VirtIO-Argo Development wiki page:
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1
> 
> 
> > On Thu, Aug 26, 2021 at 5:11 AM Wei Chen <Wei.Chen@xxxxxxx> wrote:
> >
> >> Hi Akashi,
> >>
> >> > -----Original Message-----
> >> > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> >> > Sent: 2021年8月26日 17:41
> >> > To: Wei Chen <Wei.Chen@xxxxxxx>
> >> > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> >> > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly
> >> Xin
> >> > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <
> >> stratos-dev@xxxxxxxxxxxxxxxxxxx>;
> >> > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann <
> >> arnd.bergmann@xxxxxxxxxx>;
> >> > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> >> > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> >> > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> >> > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> >> > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> >> > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> >> > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> >> > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> >> Julien
> >> > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> >> > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> >> > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> >> >
> >> > Hi Wei,
> >> >
> >> > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
> >> > > On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
> >> > > > Hi Akashi,
> >> > > >
> >> > > > > -----Original Message-----
> >> > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> >> > > > > Sent: 2021年8月18日 13:39
> >> > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> >> > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> >> Stabellini
> >> > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> >> > Stratos
> >> > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> >> > dev@lists.oasis-
> >> > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> >> > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> >> > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> >> > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> >> > <cvanscha@xxxxxxxxxxxxxxxx>;
> >> > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> >> > Jean-
> >> > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> >> > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> >> > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> >> > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx
> >> >;
> >> > Julien
> >> > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> >> > Durrant
> >> > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> >> > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> >> > > > >
> >> > > > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
> >> > > > > > Hi Akashi,
> >> > > > > >
> >> > > > > > > -----Original Message-----
> >> > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> >> > > > > > > Sent: 2021年8月17日 16:08
> >> > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> >> > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> >> > Stabellini
> >> > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <
> >> alex.bennee@xxxxxxxxxx>;
> >> > > > > Stratos
> >> > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> >> > > > > dev@lists.oasis-
> >> > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh
> >> Kumar
> >> > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> >> > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan
> >> Kiszka
> >> > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> >> > <cvanscha@xxxxxxxxxxxxxxxx>;
> >> > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx
> >> >;
> >> > Jean-
> >> > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> >> > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> >> > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> >> > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> >> > <Artem_Mygaiev@xxxxxxxx>;
> >> > > > > Julien
> >> > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> >> > Durrant
> >> > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> >> > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
> >> backends
> >> > > > > > >
> >> > > > > > > Hi Wei, Oleksandr,
> >> > > > > > >
> >> > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen wrote:
> >> > > > > > > > Hi All,
> >> > > > > > > >
> >> > > > > > > > Thanks for Stefano to link my kvmtool for Xen proposal here.
> >> > > > > > > > This proposal is still discussing in Xen and KVM
> >> communities.
> >> > > > > > > > The main work is to decouple the kvmtool from KVM and make
> >> > > > > > > > other hypervisors can reuse the virtual device
> >> implementations.
> >> > > > > > > >
> >> > > > > > > > In this case, we need to introduce an intermediate
> >> hypervisor
> >> > > > > > > > layer for VMM abstraction, Which is, I think it's very close
> >> > > > > > > > to stratos' virtio hypervisor agnosticism work.
> >> > > > > > >
> >> > > > > > > # My proposal[1] comes from my own idea and doesn't always
> >> > represent
> >> > > > > > > # Linaro's view on this subject nor reflect Alex's concerns.
> >> > > > > Nevertheless,
> >> > > > > > >
> >> > > > > > > Your idea and my proposal seem to share the same background.
> >> > > > > > > Both have the similar goal and currently start with, at first,
> >> > Xen
> >> > > > > > > and are based on kvm-tool. (Actually, my work is derived from
> >> > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> >> > > > > > >
> >> > > > > > > In particular, the abstraction of hypervisor interfaces has a
> >> > same
> >> > > > > > > set of interfaces (for your "struct vmm_impl" and my "RPC
> >> > interfaces").
> >> > > > > > > This is not co-incident as we both share the same origin as I
> >> > said
> >> > > > > above.
> >> > > > > > > And so we will also share the same issues. One of them is a
> >> way
> >> > of
> >> > > > > > > "sharing/mapping FE's memory". There is some trade-off between
> >> > > > > > > the portability and the performance impact.
> >> > > > > > > So we can discuss the topic here in this ML, too.
> >> > > > > > > (See Alex's original email, too).
> >> > > > > > >
> >> > > > > > Yes, I agree.
> >> > > > > >
> >> > > > > > > On the other hand, my approach aims to create a
> >> "single-binary"
> >> > > > > solution
> >> > > > > > > in which the same binary of BE vm could run on any
> >> hypervisors.
> >> > > > > > > Somehow similar to your "proposal-#2" in [2], but in my
> >> solution,
> >> > all
> >> > > > > > > the hypervisor-specific code would be put into another entity
> >> > (VM),
> >> > > > > > > named "virtio-proxy" and the abstracted operations are served
> >> > via RPC.
> >> > > > > > > (In this sense, BE is hypervisor-agnostic but might have OS
> >> > > > > dependency.)
> >> > > > > > > But I know that we need discuss if this is a requirement even
> >> > > > > > > in Stratos project or not. (Maybe not)
> >> > > > > > >
> >> > > > > >
> >> > > > > > Sorry, I haven't had time to finish reading your virtio-proxy
> >> > completely
> >> > > > > > (I will do it ASAP). But from your description, it seems we
> >> need a
> >> > > > > > 3rd VM between FE and BE? My concern is that, if my assumption
> >> is
> >> > right,
> >> > > > > > will it increase the latency in data transport path? Even if
> >> we're
> >> > > > > > using some lightweight guest like RTOS or Unikernel,
> >> > > > >
> >> > > > > Yes, you're right. But I'm afraid that it is a matter of degree.
> >> > > > > As far as we execute 'mapping' operations at every fetch of
> >> payload,
> >> > > > > we will see latency issue (even in your case) and if we have some
> >> > solution
> >> > > > > for it, we won't see it neither in my proposal :)
> >> > > > >
> >> > > >
> >> > > > Oleksandr has sent a proposal to Xen mailing list to reduce this
> >> kind
> >> > > > of "mapping/unmapping" operations. So the latency caused by this
> >> > behavior
> >> > > > on Xen may eventually be eliminated, and Linux-KVM doesn't have that
> >> > problem.
> >> > >
> >> > > Obviously, I have not yet caught up there in the discussion.
> >> > > Which patch specifically?
> >> >
> >> > Can you give me the link to the discussion or patch, please?
> >> >
> >>
> >> It's a RFC discussion. We have tested this RFC patch internally.
> >> https://lists.xenproject.org/archives/html/xen-devel/2021-07/msg01532.html
> >>
> >> > Thanks,
> >> > -Takahiro Akashi
> >> >
> >> > > -Takahiro Akashi
> >> > >
> >> > > > > > > Specifically speaking about kvm-tool, I have a concern about
> >> its
> >> > > > > > > license term; Targeting different hypervisors and different
> >> OSs
> >> > > > > > > (which I assume includes RTOS's), the resultant library should
> >> > be
> >> > > > > > > license permissive and GPL for kvm-tool might be an issue.
> >> > > > > > > Any thoughts?
> >> > > > > > >
> >> > > > > >
> >> > > > > > Yes. If user want to implement a FreeBSD device model, but the
> >> > virtio
> >> > > > > > library is GPL. Then GPL would be a problem. If we have another
> >> > good
> >> > > > > > candidate, I am open to it.
> >> > > > >
> >> > > > > I have some candidates, particularly for vq/vring, in my mind:
> >> > > > > * Open-AMP, or
> >> > > > > * corresponding Free-BSD code
> >> > > > >
> >> > > >
> >> > > > Interesting, I will look into them : )
> >> > > >
> >> > > > Cheers,
> >> > > > Wei Chen
> >> > > >
> >> > > > > -Takahiro Akashi
> >> > > > >
> >> > > > >
> >> > > > > > > -Takahiro Akashi
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-dev/2021-
> >> > > > > > > August/000548.html
> >> > > > > > > [2] https://marc.info/?l=xen-devel&m=162373754705233&w=2
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > > From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>
> >> > > > > > > > > Sent: 2021年8月14日 23:38
> >> > > > > > > > > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>; Stefano
> >> > > > > Stabellini
> >> > > > > > > <sstabellini@xxxxxxxxxx>
> >> > > > > > > > > Cc: Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos
> >> Mailing
> >> > List
> >> > > > > > > <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-dev@lists.oasis-
> >> > open.org;
> >> > > > > Arnd
> >> > > > > > > Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> >> > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> >> > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan
> >> Kiszka
> >> > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> >> > <cvanscha@xxxxxxxxxxxxxxxx>;
> >> > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx
> >> >;
> >> > Jean-
> >> > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> >> > > > > > > <mathieu.poirier@xxxxxxxxxx>; Wei Chen <Wei.Chen@xxxxxxx>;
> >> > Oleksandr
> >> > > > > > > Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> >> > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> >> > <Artem_Mygaiev@xxxxxxxx>;
> >> > > > > Julien
> >> > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> >> > Durrant
> >> > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> >> > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
> >> > backends
> >> > > > > > > > >
> >> > > > > > > > > Hello, all.
> >> > > > > > > > >
> >> > > > > > > > > Please see some comments below. And sorry for the possible
> >> > format
> >> > > > > > > issues.
> >> > > > > > > > >
> >> > > > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
> >> > > > > > > <mailto:takahiro.akashi@xxxxxxxxxx> wrote:
> >> > > > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano
> >> > Stabellini
> >> > > > > wrote:
> >> > > > > > > > > > > CCing people working on Xen+VirtIO and IOREQs. Not
> >> > trimming
> >> > > > > the
> >> > > > > > > original
> >> > > > > > > > > > > email to let them read the full context.
> >> > > > > > > > > > >
> >> > > > > > > > > > > My comments below are related to a potential Xen
> >> > > > > implementation,
> >> > > > > > > not
> >> > > > > > > > > > > because it is the only implementation that matters,
> >> but
> >> > > > > because it
> >> > > > > > > is
> >> > > > > > > > > > > the one I know best.
> >> > > > > > > > > >
> >> > > > > > > > > > Please note that my proposal (and hence the working
> >> > prototype)[1]
> >> > > > > > > > > > is based on Xen's virtio implementation (i.e. IOREQ) and
> >> > > > > > > particularly
> >> > > > > > > > > > EPAM's virtio-disk application (backend server).
> >> > > > > > > > > > It has been, I believe, well generalized but is still a
> >> > bit
> >> > > > > biased
> >> > > > > > > > > > toward this original design.
> >> > > > > > > > > >
> >> > > > > > > > > > So I hope you like my approach :)
> >> > > > > > > > > >
> >> > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
> >> > dev/2021-
> >> > > > > > > August/000546.html
> >> > > > > > > > > >
> >> > > > > > > > > > Let me take this opportunity to explain a bit more about
> >> > my
> >> > > > > approach
> >> > > > > > > below.
> >> > > > > > > > > >
> >> > > > > > > > > > > Also, please see this relevant email thread:
> >> > > > > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
> >> > > > > > > > > > > > Hi,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > One of the goals of Project Stratos is to enable
> >> > hypervisor
> >> > > > > > > agnostic
> >> > > > > > > > > > > > backends so we can enable as much re-use of code as
> >> > possible
> >> > > > > and
> >> > > > > > > avoid
> >> > > > > > > > > > > > repeating ourselves. This is the flip side of the
> >> > front end
> >> > > > > > > where
> >> > > > > > > > > > > > multiple front-end implementations are required -
> >> one
> >> > per OS,
> >> > > > > > > assuming
> >> > > > > > > > > > > > you don't just want Linux guests. The resultant
> >> guests
> >> > are
> >> > > > > > > trivially
> >> > > > > > > > > > > > movable between hypervisors modulo any abstracted
> >> > paravirt
> >> > > > > type
> >> > > > > > > > > > > > interfaces.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > In my original thumb nail sketch of a solution I
> >> > envisioned
> >> > > > > > > vhost-user
> >> > > > > > > > > > > > daemons running in a broadly POSIX like environment.
> >> > The
> >> > > > > > > interface to
> >> > > > > > > > > > > > the daemon is fairly simple requiring only some
> >> mapped
> >> > > > > memory
> >> > > > > > > and some
> >> > > > > > > > > > > > sort of signalling for events (on Linux this is
> >> > eventfd).
> >> > > > > The
> >> > > > > > > idea was a
> >> > > > > > > > > > > > stub binary would be responsible for any hypervisor
> >> > specific
> >> > > > > > > setup and
> >> > > > > > > > > > > > then launch a common binary to deal with the actual
> >> > > > > virtqueue
> >> > > > > > > requests
> >> > > > > > > > > > > > themselves.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Since that original sketch we've seen an expansion
> >> in
> >> > the
> >> > > > > sort
> >> > > > > > > of ways
> >> > > > > > > > > > > > backends could be created. There is interest in
> >> > > > > encapsulating
> >> > > > > > > backends
> >> > > > > > > > > > > > in RTOSes or unikernels for solutions like SCMI.
> >> There
> >> > > > > interest
> >> > > > > > > in Rust
> >> > > > > > > > > > > > has prompted ideas of using the trait interface to
> >> > abstract
> >> > > > > > > differences
> >> > > > > > > > > > > > away as well as the idea of bare-metal Rust
> >> backends.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > We have a card (STR-12) called "Hypercall
> >> > Standardisation"
> >> > > > > which
> >> > > > > > > > > > > > calls for a description of the APIs needed from the
> >> > > > > hypervisor
> >> > > > > > > side to
> >> > > > > > > > > > > > support VirtIO guests and their backends. However we
> >> > are
> >> > > > > some
> >> > > > > > > way off
> >> > > > > > > > > > > > from that at the moment as I think we need to at
> >> least
> >> > > > > > > demonstrate one
> >> > > > > > > > > > > > portable backend before we start codifying
> >> > requirements. To
> >> > > > > that
> >> > > > > > > end I
> >> > > > > > > > > > > > want to think about what we need for a backend to
> >> > function.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Configuration
> >> > > > > > > > > > > > =============
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > In the type-2 setup this is typically fairly simple
> >> > because
> >> > > > > the
> >> > > > > > > host
> >> > > > > > > > > > > > system can orchestrate the various modules that make
> >> > up the
> >> > > > > > > complete
> >> > > > > > > > > > > > system. In the type-1 case (or even type-2 with
> >> > delegated
> >> > > > > > > service VMs)
> >> > > > > > > > > > > > we need some sort of mechanism to inform the backend
> >> > VM
> >> > > > > about
> >> > > > > > > key
> >> > > > > > > > > > > > details about the system:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >   - where virt queue memory is in it's address space
> >> > > > > > > > > > > >   - how it's going to receive (interrupt) and
> >> trigger
> >> > (kick)
> >> > > > > > > events
> >> > > > > > > > > > > >   - what (if any) resources the backend needs to
> >> > connect to
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Obviously you can elide over configuration issues by
> >> > having
> >> > > > > > > static
> >> > > > > > > > > > > > configurations and baking the assumptions into your
> >> > guest
> >> > > > > images
> >> > > > > > > however
> >> > > > > > > > > > > > this isn't scalable in the long term. The obvious
> >> > solution
> >> > > > > seems
> >> > > > > > > to be
> >> > > > > > > > > > > > extending a subset of Device Tree data to user space
> >> > but
> >> > > > > perhaps
> >> > > > > > > there
> >> > > > > > > > > > > > are other approaches?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Before any virtio transactions can take place the
> >> > > > > appropriate
> >> > > > > > > memory
> >> > > > > > > > > > > > mappings need to be made between the FE guest and
> >> the
> >> > BE
> >> > > > > guest.
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Currently the whole of the FE guests address space
> >> > needs to
> >> > > > > be
> >> > > > > > > visible
> >> > > > > > > > > > > > to whatever is serving the virtio requests. I can
> >> > envision 3
> >> > > > > > > approaches:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >  * BE guest boots with memory already mapped
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >  This would entail the guest OS knowing where in
> >> it's
> >> > Guest
> >> > > > > > > Physical
> >> > > > > > > > > > > >  Address space is already taken up and avoiding
> >> > clashing. I
> >> > > > > > > would assume
> >> > > > > > > > > > > >  in this case you would want a standard interface to
> >> > > > > userspace
> >> > > > > > > to then
> >> > > > > > > > > > > >  make that address space visible to the backend
> >> daemon.
> >> > > > > > > > > >
> >> > > > > > > > > > Yet another way here is that we would have well known
> >> > "shared
> >> > > > > > > memory" between
> >> > > > > > > > > > VMs. I think that Jailhouse's ivshmem gives us good
> >> > insights on
> >> > > > > this
> >> > > > > > > matter
> >> > > > > > > > > > and that it can even be an alternative for hypervisor-
> >> > agnostic
> >> > > > > > > solution.
> >> > > > > > > > > >
> >> > > > > > > > > > (Please note memory regions in ivshmem appear as a PCI
> >> > device
> >> > > > > and
> >> > > > > > > can be
> >> > > > > > > > > > mapped locally.)
> >> > > > > > > > > >
> >> > > > > > > > > > I want to add this shared memory aspect to my
> >> virtio-proxy,
> >> > but
> >> > > > > > > > > > the resultant solution would eventually look similar to
> >> > ivshmem.
> >> > > > > > > > > >
> >> > > > > > > > > > > >  * BE guests boots with a hypervisor handle to
> >> memory
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >  The BE guest is then free to map the FE's memory to
> >> > where
> >> > > > > it
> >> > > > > > > wants in
> >> > > > > > > > > > > >  the BE's guest physical address space.
> >> > > > > > > > > > >
> >> > > > > > > > > > > I cannot see how this could work for Xen. There is no
> >> > "handle"
> >> > > > > to
> >> > > > > > > give
> >> > > > > > > > > > > to the backend if the backend is not running in dom0.
> >> So
> >> > for
> >> > > > > Xen I
> >> > > > > > > think
> >> > > > > > > > > > > the memory has to be already mapped
> >> > > > > > > > > >
> >> > > > > > > > > > In Xen's IOREQ solution (virtio-blk), the following
> >> > information
> >> > > > > is
> >> > > > > > > expected
> >> > > > > > > > > > to be exposed to BE via Xenstore:
> >> > > > > > > > > > (I know that this is a tentative approach though.)
> >> > > > > > > > > >    - the start address of configuration space
> >> > > > > > > > > >    - interrupt number
> >> > > > > > > > > >    - file path for backing storage
> >> > > > > > > > > >    - read-only flag
> >> > > > > > > > > > And the BE server have to call a particular hypervisor
> >> > interface
> >> > > > > to
> >> > > > > > > > > > map the configuration space.
> >> > > > > > > > >
> >> > > > > > > > > Yes, Xenstore was chosen as a simple way to pass
> >> > configuration
> >> > > > > info to
> >> > > > > > > the backend running in a non-toolstack domain.
> >> > > > > > > > > I remember, there was a wish to avoid using Xenstore in
> >> > Virtio
> >> > > > > backend
> >> > > > > > > itself if possible, so for non-toolstack domain, this could
> >> done
> >> > with
> >> > > > > > > adjusting devd (daemon that listens for devices and launches
> >> > backends)
> >> > > > > > > > > to read backend configuration from the Xenstore anyway and
> >> > pass it
> >> > > > > to
> >> > > > > > > the backend via command line arguments.
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Yes, in current PoC code we're using xenstore to pass device
> >> > > > > > > configuration.
> >> > > > > > > > We also designed a static device configuration parse method
> >> > for
> >> > > > > Dom0less
> >> > > > > > > or
> >> > > > > > > > other scenarios don't have xentool. yes, it's from device
> >> > model
> >> > > > > command
> >> > > > > > > line
> >> > > > > > > > or a config file.
> >> > > > > > > >
> >> > > > > > > > > But, if ...
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > In my approach (virtio-proxy), all those Xen (or
> >> > hypervisor)-
> >> > > > > > > specific
> >> > > > > > > > > > stuffs are contained in virtio-proxy, yet another VM, to
> >> > hide
> >> > > > > all
> >> > > > > > > details.
> >> > > > > > > > >
> >> > > > > > > > > ... the solution how to overcome that is already found and
> >> > proven
> >> > > > > to
> >> > > > > > > work then even better.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > # My point is that a "handle" is not mandatory for
> >> > executing
> >> > > > > mapping.
> >> > > > > > > > > >
> >> > > > > > > > > > > and the mapping probably done by the
> >> > > > > > > > > > > toolstack (also see below.) Or we would have to
> >> invent a
> >> > new
> >> > > > > Xen
> >> > > > > > > > > > > hypervisor interface and Xen virtual machine
> >> privileges
> >> > to
> >> > > > > allow
> >> > > > > > > this
> >> > > > > > > > > > > kind of mapping.
> >> > > > > > > > > >
> >> > > > > > > > > > > If we run the backend in Dom0 that we have no problems
> >> > of
> >> > > > > course.
> >> > > > > > > > > >
> >> > > > > > > > > > One of difficulties on Xen that I found in my approach
> >> is
> >> > that
> >> > > > > > > calling
> >> > > > > > > > > > such hypervisor intefaces (registering IOREQ, mapping
> >> > memory) is
> >> > > > > > > only
> >> > > > > > > > > > allowed on BE servers themselvies and so we will have to
> >> > extend
> >> > > > > > > those
> >> > > > > > > > > > interfaces.
> >> > > > > > > > > > This, however, will raise some concern on security and
> >> > privilege
> >> > > > > > > distribution
> >> > > > > > > > > > as Stefan suggested.
> >> > > > > > > > >
> >> > > > > > > > > We also faced policy related issues with Virtio backend
> >> > running in
> >> > > > > > > other than Dom0 domain in a "dummy" xsm mode. In our target
> >> > system we
> >> > > > > run
> >> > > > > > > the backend in a driver
> >> > > > > > > > > domain (we call it DomD) where the underlying H/W resides.
> >> > We
> >> > > > > trust it,
> >> > > > > > > so we wrote policy rules (to be used in "flask" xsm mode) to
> >> > provide
> >> > > > > it
> >> > > > > > > with a little bit more privileges than a simple DomU had.
> >> > > > > > > > > Now it is permitted to issue device-model, resource and
> >> > memory
> >> > > > > > > mappings, etc calls.
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > To activate the mapping will
> >> > > > > > > > > > > >  require some sort of hypercall to the hypervisor. I
> >> > can see
> >> > > > > two
> >> > > > > > > options
> >> > > > > > > > > > > >  at this point:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >   - expose the handle to userspace for daemon/helper
> >> > to
> >> > > > > trigger
> >> > > > > > > the
> >> > > > > > > > > > > >     mapping via existing hypercall interfaces. If
> >> > using a
> >> > > > > helper
> >> > > > > > > you
> >> > > > > > > > > > > >     would have a hypervisor specific one to avoid
> >> the
> >> > daemon
> >> > > > > > > having to
> >> > > > > > > > > > > >     care too much about the details or push that
> >> > complexity
> >> > > > > into
> >> > > > > > > a
> >> > > > > > > > > > > >     compile time option for the daemon which would
> >> > result in
> >> > > > > > > different
> >> > > > > > > > > > > >     binaries although a common source base.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >   - expose a new kernel ABI to abstract the
> >> hypercall
> >> > > > > > > differences away
> >> > > > > > > > > > > >     in the guest kernel. In this case the userspace
> >> > would
> >> > > > > > > essentially
> >> > > > > > > > > > > >     ask for an abstract "map guest N memory to
> >> > userspace
> >> > > > > ptr"
> >> > > > > > > and let
> >> > > > > > > > > > > >     the kernel deal with the different hypercall
> >> > interfaces.
> >> > > > > > > This of
> >> > > > > > > > > > > >     course assumes the majority of BE guests would
> >> be
> >> > Linux
> >> > > > > > > kernels and
> >> > > > > > > > > > > >     leaves the bare-metal/unikernel approaches to
> >> > their own
> >> > > > > > > devices.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Operation
> >> > > > > > > > > > > > =========
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The core of the operation of VirtIO is fairly
> >> simple.
> >> > Once
> >> > > > > the
> >> > > > > > > > > > > > vhost-user feature negotiation is done it's a case
> >> of
> >> > > > > receiving
> >> > > > > > > update
> >> > > > > > > > > > > > events and parsing the resultant virt queue for
> >> data.
> >> > The
> >> > > > > vhost-
> >> > > > > > > user
> >> > > > > > > > > > > > specification handles a bunch of setup before that
> >> > point,
> >> > > > > mostly
> >> > > > > > > to
> >> > > > > > > > > > > > detail where the virt queues are set up FD's for
> >> > memory and
> >> > > > > > > event
> >> > > > > > > > > > > > communication. This is where the envisioned stub
> >> > process
> >> > > > > would
> >> > > > > > > be
> >> > > > > > > > > > > > responsible for getting the daemon up and ready to
> >> run.
> >> > This
> >> > > > > is
> >> > > > > > > > > > > > currently done inside a big VMM like QEMU but I
> >> > suspect a
> >> > > > > modern
> >> > > > > > > > > > > > approach would be to use the rust-vmm vhost crate.
> >> It
> >> > would
> >> > > > > then
> >> > > > > > > either
> >> > > > > > > > > > > > communicate with the kernel's abstracted ABI or be
> >> re-
> >> > > > > targeted
> >> > > > > > > as a
> >> > > > > > > > > > > > build option for the various hypervisors.
> >> > > > > > > > > > >
> >> > > > > > > > > > > One thing I mentioned before to Alex is that Xen
> >> doesn't
> >> > have
> >> > > > > VMMs
> >> > > > > > > the
> >> > > > > > > > > > > way they are typically envisioned and described in
> >> other
> >> > > > > > > environments.
> >> > > > > > > > > > > Instead, Xen has IOREQ servers. Each of them connects
> >> > > > > > > independently to
> >> > > > > > > > > > > Xen via the IOREQ interface. E.g. today multiple QEMUs
> >> > could
> >> > > > > be
> >> > > > > > > used as
> >> > > > > > > > > > > emulators for a single Xen VM, each of them connecting
> >> > to Xen
> >> > > > > > > > > > > independently via the IOREQ interface.
> >> > > > > > > > > > >
> >> > > > > > > > > > > The component responsible for starting a daemon and/or
> >> > setting
> >> > > > > up
> >> > > > > > > shared
> >> > > > > > > > > > > interfaces is the toolstack: the xl command and the
> >> > > > > libxl/libxc
> >> > > > > > > > > > > libraries.
> >> > > > > > > > > >
> >> > > > > > > > > > I think that VM configuration management (or
> >> orchestration
> >> > in
> >> > > > > > > Startos
> >> > > > > > > > > > jargon?) is a subject to debate in parallel.
> >> > > > > > > > > > Otherwise, is there any good assumption to avoid it
> >> right
> >> > now?
> >> > > > > > > > > >
> >> > > > > > > > > > > Oleksandr and others I CCed have been working on ways
> >> > for the
> >> > > > > > > toolstack
> >> > > > > > > > > > > to create virtio backends and setup memory mappings.
> >> > They
> >> > > > > might be
> >> > > > > > > able
> >> > > > > > > > > > > to provide more info on the subject. I do think we
> >> miss
> >> > a way
> >> > > > > to
> >> > > > > > > provide
> >> > > > > > > > > > > the configuration to the backend and anything else
> >> that
> >> > the
> >> > > > > > > backend
> >> > > > > > > > > > > might require to start doing its job.
> >> > > > > > > > >
> >> > > > > > > > > Yes, some work has been done for the toolstack to handle
> >> > Virtio
> >> > > > > MMIO
> >> > > > > > > devices in
> >> > > > > > > > > general and Virtio block devices in particular. However,
> >> it
> >> > has
> >> > > > > not
> >> > > > > > > been upstreaned yet.
> >> > > > > > > > > Updated patches on review now:
> >> > > > > > > > > https://lore.kernel.org/xen-devel/1621626361-29076-1-git-
> >> > send-
> >> > > > > email-
> >> > > > > > > olekstysh@xxxxxxxxx/
> >> > > > > > > > >
> >> > > > > > > > > There is an additional (also important) activity to
> >> > improve/fix
> >> > > > > > > foreign memory mapping on Arm which I am also involved in.
> >> > > > > > > > > The foreign memory mapping is proposed to be used for
> >> Virtio
> >> > > > > backends
> >> > > > > > > (device emulators) if there is a need to run guest OS
> >> completely
> >> > > > > > > unmodified.
> >> > > > > > > > > Of course, the more secure way would be to use grant
> >> memory
> >> > > > > mapping.
> >> > > > > > > Brietly, the main difference between them is that with foreign
> >> > mapping
> >> > > > > the
> >> > > > > > > backend
> >> > > > > > > > > can map any guest memory it wants to map, but with grant
> >> > mapping
> >> > > > > it is
> >> > > > > > > allowed to map only what was previously granted by the
> >> frontend.
> >> > > > > > > > >
> >> > > > > > > > > So, there might be a problem if we want to pre-map some
> >> > guest
> >> > > > > memory
> >> > > > > > > in advance or to cache mappings in the backend in order to
> >> > improve
> >> > > > > > > performance (because the mapping/unmapping guest pages every
> >> > request
> >> > > > > > > requires a lot of back and forth to Xen + P2M updates). In a
> >> > nutshell,
> >> > > > > > > currently, in order to map a guest page into the backend
> >> address
> >> > space
> >> > > > > we
> >> > > > > > > need to steal a real physical page from the backend domain.
> >> So,
> >> > with
> >> > > > > the
> >> > > > > > > said optimizations we might end up with no free memory in the
> >> > backend
> >> > > > > > > domain (see XSA-300). And what we try to achieve is to not
> >> waste
> >> > a
> >> > > > > real
> >> > > > > > > domain memory at all by providing safe non-allocated-yet (so
> >> > unused)
> >> > > > > > > address space for the foreign (and grant) pages to be mapped
> >> > into,
> >> > > > > this
> >> > > > > > > enabling work implies Xen and Linux (and likely DTB bindings)
> >> > changes.
> >> > > > > > > However, as it turned out, for this to work in a proper and
> >> safe
> >> > way
> >> > > > > some
> >> > > > > > > prereq work needs to be done.
> >> > > > > > > > > You can find the related Xen discussion at:
> >> > > > > > > > > https://lore.kernel.org/xen-devel/1627489110-25633-1-git-
> >> > send-
> >> > > > > email-
> >> > > > > > > olekstysh@xxxxxxxxx/
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > One question is how to best handle notification and
> >> > kicks.
> >> > > > > The
> >> > > > > > > existing
> >> > > > > > > > > > > > vhost-user framework uses eventfd to signal the
> >> daemon
> >> > > > > (although
> >> > > > > > > QEMU
> >> > > > > > > > > > > > is quite capable of simulating them when you use
> >> TCG).
> >> > Xen
> >> > > > > has
> >> > > > > > > it's own
> >> > > > > > > > > > > > IOREQ mechanism. However latency is an important
> >> > factor and
> >> > > > > > > having
> >> > > > > > > > > > > > events go through the stub would add quite a lot.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Yeah I think, regardless of anything else, we want the
> >> > > > > backends to
> >> > > > > > > > > > > connect directly to the Xen hypervisor.
> >> > > > > > > > > >
> >> > > > > > > > > > In my approach,
> >> > > > > > > > > >  a) BE -> FE: interrupts triggered by BE calling a
> >> > hypervisor
> >> > > > > > > interface
> >> > > > > > > > > >               via virtio-proxy
> >> > > > > > > > > >  b) FE -> BE: MMIO to config raises events (in event
> >> > channels),
> >> > > > > > > which is
> >> > > > > > > > > >               converted to a callback to BE via virtio-
> >> > proxy
> >> > > > > > > > > >               (Xen's event channel is internnally
> >> > implemented by
> >> > > > > > > interrupts.)
> >> > > > > > > > > >
> >> > > > > > > > > > I don't know what "connect directly" means here, but
> >> > sending
> >> > > > > > > interrupts
> >> > > > > > > > > > to the opposite side would be best efficient.
> >> > > > > > > > > > Ivshmem, I suppose, takes this approach by utilizing
> >> PCI's
> >> > msi-x
> >> > > > > > > mechanism.
> >> > > > > > > > >
> >> > > > > > > > > Agree that MSI would be more efficient than SPI...
> >> > > > > > > > > At the moment, in order to notify the frontend, the
> >> backend
> >> > issues
> >> > > > > a
> >> > > > > > > specific device-model call to query Xen to inject a
> >> > corresponding SPI
> >> > > > > to
> >> > > > > > > the guest.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Could we consider the kernel internally converting
> >> > IOREQ
> >> > > > > > > messages from
> >> > > > > > > > > > > > the Xen hypervisor to eventfd events? Would this
> >> scale
> >> > with
> >> > > > > > > other kernel
> >> > > > > > > > > > > > hypercall interfaces?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > So any thoughts on what directions are worth
> >> > experimenting
> >> > > > > with?
> >> > > > > > > > > > >
> >> > > > > > > > > > > One option we should consider is for each backend to
> >> > connect
> >> > > > > to
> >> > > > > > > Xen via
> >> > > > > > > > > > > the IOREQ interface. We could generalize the IOREQ
> >> > interface
> >> > > > > and
> >> > > > > > > make it
> >> > > > > > > > > > > hypervisor agnostic. The interface is really trivial
> >> and
> >> > easy
> >> > > > > to
> >> > > > > > > add.
> >> > > > > > > > > >
> >> > > > > > > > > > As I said above, my proposal does the same thing that
> >> you
> >> > > > > mentioned
> >> > > > > > > here :)
> >> > > > > > > > > > The difference is that I do call hypervisor interfaces
> >> via
> >> > > > > virtio-
> >> > > > > > > proxy.
> >> > > > > > > > > >
> >> > > > > > > > > > > The only Xen-specific part is the notification
> >> mechanism,
> >> > > > > which is
> >> > > > > > > an
> >> > > > > > > > > > > event channel. If we replaced the event channel with
> >> > something
> >> > > > > > > else the
> >> > > > > > > > > > > interface would be generic. See:
> >> > > > > > > > > > > https://gitlab.com/xen-project/xen/-
> >> > > > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
> >> > > > > > > > > > >
> >> > > > > > > > > > > I don't think that translating IOREQs to eventfd in
> >> the
> >> > kernel
> >> > > > > is
> >> > > > > > > a
> >> > > > > > > > > > > good idea: if feels like it would be extra complexity
> >> > and that
> >> > > > > the
> >> > > > > > > > > > > kernel shouldn't be involved as this is a backend-
> >> > hypervisor
> >> > > > > > > interface.
> >> > > > > > > > > >
> >> > > > > > > > > > Given that we may want to implement BE as a bare-metal
> >> > > > > application
> >> > > > > > > > > > as I did on Zephyr, I don't think that the translation
> >> > would not
> >> > > > > be
> >> > > > > > > > > > a big issue, especially on RTOS's.
> >> > > > > > > > > > It will be some kind of abstraction layer of interrupt
> >> > handling
> >> > > > > > > > > > (or nothing but a callback mechanism).
> >> > > > > > > > > >
> >> > > > > > > > > > > Also, eventfd is very Linux-centric and we are trying
> >> to
> >> > > > > design an
> >> > > > > > > > > > > interface that could work well for RTOSes too. If we
> >> > want to
> >> > > > > do
> >> > > > > > > > > > > something different, both OS-agnostic and hypervisor-
> >> > agnostic,
> >> > > > > > > perhaps
> >> > > > > > > > > > > we could design a new interface. One that could be
> >> > > > > implementable
> >> > > > > > > in the
> >> > > > > > > > > > > Xen hypervisor itself (like IOREQ) and of course any
> >> > other
> >> > > > > > > hypervisor
> >> > > > > > > > > > > too.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > There is also another problem. IOREQ is probably not
> >> be
> >> > the
> >> > > > > only
> >> > > > > > > > > > > interface needed. Have a look at
> >> > > > > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2.
> >> > Don't we
> >> > > > > > > also need
> >> > > > > > > > > > > an interface for the backend to inject interrupts into
> >> > the
> >> > > > > > > frontend? And
> >> > > > > > > > > > > if the backend requires dynamic memory mappings of
> >> > frontend
> >> > > > > pages,
> >> > > > > > > then
> >> > > > > > > > > > > we would also need an interface to map/unmap domU
> >> pages.
> >> > > > > > > > > >
> >> > > > > > > > > > My proposal document might help here; All the interfaces
> >> > > > > required
> >> > > > > > > for
> >> > > > > > > > > > virtio-proxy (or hypervisor-related interfaces) are
> >> listed
> >> > as
> >> > > > > > > > > > RPC protocols :)
> >> > > > > > > > > >
> >> > > > > > > > > > > These interfaces are a lot more problematic than
> >> IOREQ:
> >> > IOREQ
> >> > > > > is
> >> > > > > > > tiny
> >> > > > > > > > > > > and self-contained. It is easy to add anywhere. A new
> >> > > > > interface to
> >> > > > > > > > > > > inject interrupts or map pages is more difficult to
> >> > manage
> >> > > > > because
> >> > > > > > > it
> >> > > > > > > > > > > would require changes scattered across the various
> >> > emulators.
> >> > > > > > > > > >
> >> > > > > > > > > > Exactly. I have no confident yet that my approach will
> >> > also
> >> > > > > apply
> >> > > > > > > > > > to other hypervisors than Xen.
> >> > > > > > > > > > Technically, yes, but whether people can accept it or
> >> not
> >> > is a
> >> > > > > > > different
> >> > > > > > > > > > matter.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > -Takahiro Akashi
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > Regards,
> >> > > > > > > > >
> >> > > > > > > > > Oleksandr Tyshchenko
> >> > > > > > > > IMPORTANT NOTICE: The contents of this email and any
> >> > attachments are
> >> > > > > > > confidential and may also be privileged. If you are not the
> >> > intended
> >> > > > > > > recipient, please notify the sender immediately and do not
> >> > disclose
> >> > > > > the
> >> > > > > > > contents to any other person, use it for any purpose, or store
> >> > or copy
> >> > > > > the
> >> > > > > > > information in any medium. Thank you.
> >> > > > > > IMPORTANT NOTICE: The contents of this email and any attachments
> >> > are
> >> > > > > confidential and may also be privileged. If you are not the
> >> intended
> >> > > > > recipient, please notify the sender immediately and do not
> >> disclose
> >> > the
> >> > > > > contents to any other person, use it for any purpose, or store or
> >> > copy the
> >> > > > > information in any medium. Thank you.
> >> > > > IMPORTANT NOTICE: The contents of this email and any attachments are
> >> > confidential and may also be privileged. If you are not the intended
> >> > recipient, please notify the sender immediately and do not disclose the
> >> > contents to any other person, use it for any purpose, or store or copy
> >> the
> >> > information in any medium. Thank you.
> >> IMPORTANT NOTICE: The contents of this email and any attachments are
> >> confidential and may also be privileged. If you are not the intended
> >> recipient, please notify the sender immediately and do not disclose the
> >> contents to any other person, use it for any purpose, or store or copy the
> >> information in any medium. Thank you.
> >>
> >



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.