[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Enabling hypervisor agnosticism for VirtIO backends



Hi Akashi,

> -----Original Message-----
> From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> Sent: 2021年9月1日 20:29
> To: Wei Chen <Wei.Chen@xxxxxxx>
> Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly Xin
> <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>;
> virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>;
> Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; Julien
> Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> <paul@xxxxxxx>; nd <nd@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> 
> Hi Wei,
> 
> On Wed, Sep 01, 2021 at 11:12:58AM +0000, Wei Chen wrote:
> > Hi Akashi,
> >
> > > -----Original Message-----
> > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > Sent: 2021年8月31日 14:18
> > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly
> Xin
> > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op-
> lists.linaro.org>;
> > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann
> <arnd.bergmann@xxxxxxxxxx>;
> > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> Julien
> > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > >
> > > Wei,
> > >
> > > On Thu, Aug 26, 2021 at 12:10:19PM +0000, Wei Chen wrote:
> > > > Hi Akashi,
> > > >
> > > > > -----Original Message-----
> > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > Sent: 2021年8月26日 17:41
> > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> Kaly
> > > Xin
> > > > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op-
> > > lists.linaro.org>;
> > > > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann
> > > <arnd.bergmann@xxxxxxxxxx>;
> > > > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> Jean-
> > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> > > Julien
> > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> Durrant
> > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > >
> > > > > Hi Wei,
> > > > >
> > > > > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
> > > > > > On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
> > > > > > > Hi Akashi,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > > > > Sent: 2021年8月18日 13:39
> > > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> > > Stabellini
> > > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e
> <alex.bennee@xxxxxxxxxx>;
> > > > > Stratos
> > > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > > > > dev@lists.oasis-
> > > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh
> Kumar
> > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan
> Kiszka
> > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > > > > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri
> <vatsa@xxxxxxxxxxxxxx>;
> > > > > Jean-
> > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > > <Artem_Mygaiev@xxxxxxxx>;
> > > > > Julien
> > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>;
> Paul
> > > > > Durrant
> > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
> backends
> > > > > > > >
> > > > > > > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
> > > > > > > > > Hi Akashi,
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > > > > > > Sent: 2021年8月17日 16:08
> > > > > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> > > > > Stabellini
> > > > > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e
> > > <alex.bennee@xxxxxxxxxx>;
> > > > > > > > Stratos
> > > > > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > > > > > > > dev@lists.oasis-
> > > > > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>;
> Viresh
> > > Kumar
> > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx;
> Jan
> > > Kiszka
> > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > > > > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri
> > > <vatsa@xxxxxxxxxxxxxx>;
> > > > > Jean-
> > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu
> Poirier
> > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > > > > <Artem_Mygaiev@xxxxxxxx>;
> > > > > > > > Julien
> > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>;
> > > Paul
> > > > > Durrant
> > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
> > > backends
> > > > > > > > > >
> > > > > > > > > > Hi Wei, Oleksandr,
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen wrote:
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for Stefano to link my kvmtool for Xen proposal
> > > here.
> > > > > > > > > > > This proposal is still discussing in Xen and KVM
> > > communities.
> > > > > > > > > > > The main work is to decouple the kvmtool from KVM and
> make
> > > > > > > > > > > other hypervisors can reuse the virtual device
> > > implementations.
> > > > > > > > > > >
> > > > > > > > > > > In this case, we need to introduce an intermediate
> > > hypervisor
> > > > > > > > > > > layer for VMM abstraction, Which is, I think it's very
> > > close
> > > > > > > > > > > to stratos' virtio hypervisor agnosticism work.
> > > > > > > > > >
> > > > > > > > > > # My proposal[1] comes from my own idea and doesn't
> always
> > > > > represent
> > > > > > > > > > # Linaro's view on this subject nor reflect Alex's
> concerns.
> > > > > > > > Nevertheless,
> > > > > > > > > >
> > > > > > > > > > Your idea and my proposal seem to share the same
> background.
> > > > > > > > > > Both have the similar goal and currently start with, at
> > > first,
> > > > > Xen
> > > > > > > > > > and are based on kvm-tool. (Actually, my work is derived
> > > from
> > > > > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> > > > > > > > > >
> > > > > > > > > > In particular, the abstraction of hypervisor interfaces
> has
> > > a
> > > > > same
> > > > > > > > > > set of interfaces (for your "struct vmm_impl" and my
> "RPC
> > > > > interfaces").
> > > > > > > > > > This is not co-incident as we both share the same origin
> as
> > > I
> > > > > said
> > > > > > > > above.
> > > > > > > > > > And so we will also share the same issues. One of them
> is a
> > > way
> > > > > of
> > > > > > > > > > "sharing/mapping FE's memory". There is some trade-off
> > > between
> > > > > > > > > > the portability and the performance impact.
> > > > > > > > > > So we can discuss the topic here in this ML, too.
> > > > > > > > > > (See Alex's original email, too).
> > > > > > > > > >
> > > > > > > > > Yes, I agree.
> > > > > > > > >
> > > > > > > > > > On the other hand, my approach aims to create a "single-
> > > binary"
> > > > > > > > solution
> > > > > > > > > > in which the same binary of BE vm could run on any
> > > hypervisors.
> > > > > > > > > > Somehow similar to your "proposal-#2" in [2], but in my
> > > solution,
> > > > > all
> > > > > > > > > > the hypervisor-specific code would be put into another
> > > entity
> > > > > (VM),
> > > > > > > > > > named "virtio-proxy" and the abstracted operations are
> > > served
> > > > > via RPC.
> > > > > > > > > > (In this sense, BE is hypervisor-agnostic but might have
> OS
> > > > > > > > dependency.)
> > > > > > > > > > But I know that we need discuss if this is a requirement
> > > even
> > > > > > > > > > in Stratos project or not. (Maybe not)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Sorry, I haven't had time to finish reading your virtio-
> proxy
> > > > > completely
> > > > > > > > > (I will do it ASAP). But from your description, it seems
> we
> > > need a
> > > > > > > > > 3rd VM between FE and BE? My concern is that, if my
> assumption
> > > is
> > > > > right,
> > > > > > > > > will it increase the latency in data transport path? Even
> if
> > > we're
> > > > > > > > > using some lightweight guest like RTOS or Unikernel,
> > > > > > > >
> > > > > > > > Yes, you're right. But I'm afraid that it is a matter of
> degree.
> > > > > > > > As far as we execute 'mapping' operations at every fetch of
> > > payload,
> > > > > > > > we will see latency issue (even in your case) and if we have
> > > some
> > > > > solution
> > > > > > > > for it, we won't see it neither in my proposal :)
> > > > > > > >
> > > > > > >
> > > > > > > Oleksandr has sent a proposal to Xen mailing list to reduce
> this
> > > kind
> > > > > > > of "mapping/unmapping" operations. So the latency caused by
> this
> > > > > behavior
> > > > > > > on Xen may eventually be eliminated, and Linux-KVM doesn't
> have
> > > that
> > > > > problem.
> > > > > >
> > > > > > Obviously, I have not yet caught up there in the discussion.
> > > > > > Which patch specifically?
> > > > >
> > > > > Can you give me the link to the discussion or patch, please?
> > > > >
> > > >
> > > > It's a RFC discussion. We have tested this RFC patch internally.
> > > > https://lists.xenproject.org/archives/html/xen-devel/2021-
> > > 07/msg01532.html
> > >
> > > I'm afraid that I miss something here, but I don't know
> > > why this proposed API will lead to eliminating 'mmap' in accessing
> > > the queued payload at every request?
> > >
> >
> > This API give Xen device model (QEMU or kvmtool) the ability to map
> > whole guest RAM in device model's address space. In this case, device
> > model doesn't need dynamic hypercall to map/unmap payload memory.
> > It can use a flat offset to access payload memory in its address
> > space directly. Just Like KVM device model does now.
> 
> Thank you. Quickly, let me make sure one thing:
> This API itself doesn't do any mapping operations, right?
> So I suppose that virtio BE guest is responsible to
> 1) fetch the information about all the memory regions in FE,
> 2) call this API to allocate a big chunk of unused space in BE,
> 3) create grant/foreign mappings for FE onto this region(S)
> in the initialization/configuration of emulated virtio devices.
> 
> Is this the way this API is expected to be used?
> Does Xen already has an interface for (1)?
> 

They are discussing in that thread to find a proper way to do it.
Because this API is common, both x86 and Arm should be considered.

> -Takahiro Akashi
> 
> > Before this API, When device model to map whole guest memory, will
> > severely consume the physical pages of Dom-0/Dom-D.
> >
> > > -Takahiro Akashi
> > >
> > >
> > > > > Thanks,
> > > > > -Takahiro Akashi
> > > > >
> > > > > > -Takahiro Akashi
> > > > > >
> > > > > > > > > > Specifically speaking about kvm-tool, I have a concern
> about
> > > its
> > > > > > > > > > license term; Targeting different hypervisors and
> different
> > > OSs
> > > > > > > > > > (which I assume includes RTOS's), the resultant library
> > > should
> > > > > be
> > > > > > > > > > license permissive and GPL for kvm-tool might be an
> issue.
> > > > > > > > > > Any thoughts?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes. If user want to implement a FreeBSD device model, but
> the
> > > > > virtio
> > > > > > > > > library is GPL. Then GPL would be a problem. If we have
> > > another
> > > > > good
> > > > > > > > > candidate, I am open to it.
> > > > > > > >
> > > > > > > > I have some candidates, particularly for vq/vring, in my
> mind:
> > > > > > > > * Open-AMP, or
> > > > > > > > * corresponding Free-BSD code
> > > > > > > >
> > > > > > >
> > > > > > > Interesting, I will look into them : )
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Wei Chen
> > > > > > >
> > > > > > > > -Takahiro Akashi
> > > > > > > >
> > > > > > > >
> > > > > > > > > > -Takahiro Akashi
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
> dev/2021-
> > > > > > > > > > August/000548.html
> > > > > > > > > > [2] https://marc.info/?l=xen-devel&m=162373754705233&w=2
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>
> > > > > > > > > > > > Sent: 2021年8月14日 23:38
> > > > > > > > > > > > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>;
> > > Stefano
> > > > > > > > Stabellini
> > > > > > > > > > <sstabellini@xxxxxxxxxx>
> > > > > > > > > > > > Cc: Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos
> > > Mailing
> > > > > List
> > > > > > > > > > <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> dev@lists.oasis-
> > > > > open.org;
> > > > > > > > Arnd
> > > > > > > > > > Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx;
> Jan
> > > Kiszka
> > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > > > > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri
> > > <vatsa@xxxxxxxxxxxxxx>;
> > > > > Jean-
> > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu
> Poirier
> > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Wei Chen
> <Wei.Chen@xxxxxxx>;
> > > > > Oleksandr
> > > > > > > > > > Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand
> Marquis
> > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > > > > <Artem_Mygaiev@xxxxxxxx>;
> > > > > > > > Julien
> > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>;
> > > Paul
> > > > > Durrant
> > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for
> VirtIO
> > > > > backends
> > > > > > > > > > > >
> > > > > > > > > > > > Hello, all.
> > > > > > > > > > > >
> > > > > > > > > > > > Please see some comments below. And sorry for the
> > > possible
> > > > > format
> > > > > > > > > > issues.
> > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
> > > > > > > > > > <mailto:takahiro.akashi@xxxxxxxxxx> wrote:
> > > > > > > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano
> > > > > Stabellini
> > > > > > > > wrote:
> > > > > > > > > > > > > > CCing people working on Xen+VirtIO and IOREQs.
> Not
> > > > > trimming
> > > > > > > > the
> > > > > > > > > > original
> > > > > > > > > > > > > > email to let them read the full context.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > My comments below are related to a potential Xen
> > > > > > > > implementation,
> > > > > > > > > > not
> > > > > > > > > > > > > > because it is the only implementation that
> matters,
> > > but
> > > > > > > > because it
> > > > > > > > > > is
> > > > > > > > > > > > > > the one I know best.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please note that my proposal (and hence the
> working
> > > > > prototype)[1]
> > > > > > > > > > > > > is based on Xen's virtio implementation (i.e.
> IOREQ)
> > > and
> > > > > > > > > > particularly
> > > > > > > > > > > > > EPAM's virtio-disk application (backend server).
> > > > > > > > > > > > > It has been, I believe, well generalized but is
> still
> > > a
> > > > > bit
> > > > > > > > biased
> > > > > > > > > > > > > toward this original design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So I hope you like my approach :)
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
> > > > > dev/2021-
> > > > > > > > > > August/000546.html
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let me take this opportunity to explain a bit more
> > > about
> > > > > my
> > > > > > > > approach
> > > > > > > > > > below.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Also, please see this relevant email thread:
> > > > > > > > > > > > > > https://marc.info/?l=xen-
> devel&m=162373754705233&w=2
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One of the goals of Project Stratos is to
> enable
> > > > > hypervisor
> > > > > > > > > > agnostic
> > > > > > > > > > > > > > > backends so we can enable as much re-use of
> code
> > > as
> > > > > possible
> > > > > > > > and
> > > > > > > > > > avoid
> > > > > > > > > > > > > > > repeating ourselves. This is the flip side of
> the
> > > > > front end
> > > > > > > > > > where
> > > > > > > > > > > > > > > multiple front-end implementations are
> required -
> > > one
> > > > > per OS,
> > > > > > > > > > assuming
> > > > > > > > > > > > > > > you don't just want Linux guests. The
> resultant
> > > guests
> > > > > are
> > > > > > > > > > trivially
> > > > > > > > > > > > > > > movable between hypervisors modulo any
> abstracted
> > > > > paravirt
> > > > > > > > type
> > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In my original thumb nail sketch of a solution
> I
> > > > > envisioned
> > > > > > > > > > vhost-user
> > > > > > > > > > > > > > > daemons running in a broadly POSIX like
> > > environment.
> > > > > The
> > > > > > > > > > interface to
> > > > > > > > > > > > > > > the daemon is fairly simple requiring only
> some
> > > mapped
> > > > > > > > memory
> > > > > > > > > > and some
> > > > > > > > > > > > > > > sort of signalling for events (on Linux this
> is
> > > > > eventfd).
> > > > > > > > The
> > > > > > > > > > idea was a
> > > > > > > > > > > > > > > stub binary would be responsible for any
> > > hypervisor
> > > > > specific
> > > > > > > > > > setup and
> > > > > > > > > > > > > > > then launch a common binary to deal with the
> > > actual
> > > > > > > > virtqueue
> > > > > > > > > > requests
> > > > > > > > > > > > > > > themselves.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Since that original sketch we've seen an
> expansion
> > > in
> > > > > the
> > > > > > > > sort
> > > > > > > > > > of ways
> > > > > > > > > > > > > > > backends could be created. There is interest
> in
> > > > > > > > encapsulating
> > > > > > > > > > backends
> > > > > > > > > > > > > > > in RTOSes or unikernels for solutions like
> SCMI.
> > > There
> > > > > > > > interest
> > > > > > > > > > in Rust
> > > > > > > > > > > > > > > has prompted ideas of using the trait
> interface to
> > > > > abstract
> > > > > > > > > > differences
> > > > > > > > > > > > > > > away as well as the idea of bare-metal Rust
> > > backends.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We have a card (STR-12) called "Hypercall
> > > > > Standardisation"
> > > > > > > > which
> > > > > > > > > > > > > > > calls for a description of the APIs needed
> from
> > > the
> > > > > > > > hypervisor
> > > > > > > > > > side to
> > > > > > > > > > > > > > > support VirtIO guests and their backends.
> However
> > > we
> > > > > are
> > > > > > > > some
> > > > > > > > > > way off
> > > > > > > > > > > > > > > from that at the moment as I think we need to
> at
> > > least
> > > > > > > > > > demonstrate one
> > > > > > > > > > > > > > > portable backend before we start codifying
> > > > > requirements. To
> > > > > > > > that
> > > > > > > > > > end I
> > > > > > > > > > > > > > > want to think about what we need for a backend
> to
> > > > > function.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Configuration
> > > > > > > > > > > > > > > =============
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In the type-2 setup this is typically fairly
> > > simple
> > > > > because
> > > > > > > > the
> > > > > > > > > > host
> > > > > > > > > > > > > > > system can orchestrate the various modules
> that
> > > make
> > > > > up the
> > > > > > > > > > complete
> > > > > > > > > > > > > > > system. In the type-1 case (or even type-2
> with
> > > > > delegated
> > > > > > > > > > service VMs)
> > > > > > > > > > > > > > > we need some sort of mechanism to inform the
> > > backend
> > > > > VM
> > > > > > > > about
> > > > > > > > > > key
> > > > > > > > > > > > > > > details about the system:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   - where virt queue memory is in it's address
> > > space
> > > > > > > > > > > > > > >   - how it's going to receive (interrupt) and
> > > trigger
> > > > > (kick)
> > > > > > > > > > events
> > > > > > > > > > > > > > >   - what (if any) resources the backend needs
> to
> > > > > connect to
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Obviously you can elide over configuration
> issues
> > > by
> > > > > having
> > > > > > > > > > static
> > > > > > > > > > > > > > > configurations and baking the assumptions into
> > > your
> > > > > guest
> > > > > > > > images
> > > > > > > > > > however
> > > > > > > > > > > > > > > this isn't scalable in the long term. The
> obvious
> > > > > solution
> > > > > > > > seems
> > > > > > > > > > to be
> > > > > > > > > > > > > > > extending a subset of Device Tree data to user
> > > space
> > > > > but
> > > > > > > > perhaps
> > > > > > > > > > there
> > > > > > > > > > > > > > > are other approaches?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Before any virtio transactions can take place
> the
> > > > > > > > appropriate
> > > > > > > > > > memory
> > > > > > > > > > > > > > > mappings need to be made between the FE guest
> and
> > > the
> > > > > BE
> > > > > > > > guest.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Currently the whole of the FE guests address
> space
> > > > > needs to
> > > > > > > > be
> > > > > > > > > > visible
> > > > > > > > > > > > > > > to whatever is serving the virtio requests. I
> can
> > > > > envision 3
> > > > > > > > > > approaches:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  * BE guest boots with memory already mapped
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  This would entail the guest OS knowing where
> in
> > > it's
> > > > > Guest
> > > > > > > > > > Physical
> > > > > > > > > > > > > > >  Address space is already taken up and
> avoiding
> > > > > clashing. I
> > > > > > > > > > would assume
> > > > > > > > > > > > > > >  in this case you would want a standard
> interface
> > > to
> > > > > > > > userspace
> > > > > > > > > > to then
> > > > > > > > > > > > > > >  make that address space visible to the
> backend
> > > daemon.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yet another way here is that we would have well
> known
> > > > > "shared
> > > > > > > > > > memory" between
> > > > > > > > > > > > > VMs. I think that Jailhouse's ivshmem gives us
> good
> > > > > insights on
> > > > > > > > this
> > > > > > > > > > matter
> > > > > > > > > > > > > and that it can even be an alternative for
> hypervisor-
> > > > > agnostic
> > > > > > > > > > solution.
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Please note memory regions in ivshmem appear as a
> PCI
> > > > > device
> > > > > > > > and
> > > > > > > > > > can be
> > > > > > > > > > > > > mapped locally.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I want to add this shared memory aspect to my
> virtio-
> > > proxy,
> > > > > but
> > > > > > > > > > > > > the resultant solution would eventually look
> similar
> > > to
> > > > > ivshmem.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >  * BE guests boots with a hypervisor handle to
> > > memory
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  The BE guest is then free to map the FE's
> memory
> > > to
> > > > > where
> > > > > > > > it
> > > > > > > > > > wants in
> > > > > > > > > > > > > > >  the BE's guest physical address space.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I cannot see how this could work for Xen. There
> is
> > > no
> > > > > "handle"
> > > > > > > > to
> > > > > > > > > > give
> > > > > > > > > > > > > > to the backend if the backend is not running in
> dom0.
> > > So
> > > > > for
> > > > > > > > Xen I
> > > > > > > > > > think
> > > > > > > > > > > > > > the memory has to be already mapped
> > > > > > > > > > > > >
> > > > > > > > > > > > > In Xen's IOREQ solution (virtio-blk), the
> following
> > > > > information
> > > > > > > > is
> > > > > > > > > > expected
> > > > > > > > > > > > > to be exposed to BE via Xenstore:
> > > > > > > > > > > > > (I know that this is a tentative approach though.)
> > > > > > > > > > > > >    - the start address of configuration space
> > > > > > > > > > > > >    - interrupt number
> > > > > > > > > > > > >    - file path for backing storage
> > > > > > > > > > > > >    - read-only flag
> > > > > > > > > > > > > And the BE server have to call a particular
> hypervisor
> > > > > interface
> > > > > > > > to
> > > > > > > > > > > > > map the configuration space.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, Xenstore was chosen as a simple way to pass
> > > > > configuration
> > > > > > > > info to
> > > > > > > > > > the backend running in a non-toolstack domain.
> > > > > > > > > > > > I remember, there was a wish to avoid using Xenstore
> in
> > > > > Virtio
> > > > > > > > backend
> > > > > > > > > > itself if possible, so for non-toolstack domain, this
> could
> > > done
> > > > > with
> > > > > > > > > > adjusting devd (daemon that listens for devices and
> launches
> > > > > backends)
> > > > > > > > > > > > to read backend configuration from the Xenstore
> anyway
> > > and
> > > > > pass it
> > > > > > > > to
> > > > > > > > > > the backend via command line arguments.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, in current PoC code we're using xenstore to pass
> > > device
> > > > > > > > > > configuration.
> > > > > > > > > > > We also designed a static device configuration parse
> > > method
> > > > > for
> > > > > > > > Dom0less
> > > > > > > > > > or
> > > > > > > > > > > other scenarios don't have xentool. yes, it's from
> device
> > > > > model
> > > > > > > > command
> > > > > > > > > > line
> > > > > > > > > > > or a config file.
> > > > > > > > > > >
> > > > > > > > > > > > But, if ...
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > In my approach (virtio-proxy), all those Xen (or
> > > > > hypervisor)-
> > > > > > > > > > specific
> > > > > > > > > > > > > stuffs are contained in virtio-proxy, yet another
> VM,
> > > to
> > > > > hide
> > > > > > > > all
> > > > > > > > > > details.
> > > > > > > > > > > >
> > > > > > > > > > > > ... the solution how to overcome that is already
> found
> > > and
> > > > > proven
> > > > > > > > to
> > > > > > > > > > work then even better.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > # My point is that a "handle" is not mandatory for
> > > > > executing
> > > > > > > > mapping.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > and the mapping probably done by the
> > > > > > > > > > > > > > toolstack (also see below.) Or we would have to
> > > invent a
> > > > > new
> > > > > > > > Xen
> > > > > > > > > > > > > > hypervisor interface and Xen virtual machine
> > > privileges
> > > > > to
> > > > > > > > allow
> > > > > > > > > > this
> > > > > > > > > > > > > > kind of mapping.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > If we run the backend in Dom0 that we have no
> > > problems
> > > > > of
> > > > > > > > course.
> > > > > > > > > > > > >
> > > > > > > > > > > > > One of difficulties on Xen that I found in my
> approach
> > > is
> > > > > that
> > > > > > > > > > calling
> > > > > > > > > > > > > such hypervisor intefaces (registering IOREQ,
> mapping
> > > > > memory) is
> > > > > > > > > > only
> > > > > > > > > > > > > allowed on BE servers themselvies and so we will
> have
> > > to
> > > > > extend
> > > > > > > > > > those
> > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > This, however, will raise some concern on security
> and
> > > > > privilege
> > > > > > > > > > distribution
> > > > > > > > > > > > > as Stefan suggested.
> > > > > > > > > > > >
> > > > > > > > > > > > We also faced policy related issues with Virtio
> backend
> > > > > running in
> > > > > > > > > > other than Dom0 domain in a "dummy" xsm mode. In our
> target
> > > > > system we
> > > > > > > > run
> > > > > > > > > > the backend in a driver
> > > > > > > > > > > > domain (we call it DomD) where the underlying H/W
> > > resides.
> > > > > We
> > > > > > > > trust it,
> > > > > > > > > > so we wrote policy rules (to be used in "flask" xsm mode)
> to
> > > > > provide
> > > > > > > > it
> > > > > > > > > > with a little bit more privileges than a simple DomU had.
> > > > > > > > > > > > Now it is permitted to issue device-model, resource
> and
> > > > > memory
> > > > > > > > > > mappings, etc calls.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To activate the mapping will
> > > > > > > > > > > > > > >  require some sort of hypercall to the
> hypervisor.
> > > I
> > > > > can see
> > > > > > > > two
> > > > > > > > > > options
> > > > > > > > > > > > > > >  at this point:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   - expose the handle to userspace for
> > > daemon/helper
> > > > > to
> > > > > > > > trigger
> > > > > > > > > > the
> > > > > > > > > > > > > > >     mapping via existing hypercall interfaces.
> If
> > > > > using a
> > > > > > > > helper
> > > > > > > > > > you
> > > > > > > > > > > > > > >     would have a hypervisor specific one to
> avoid
> > > the
> > > > > daemon
> > > > > > > > > > having to
> > > > > > > > > > > > > > >     care too much about the details or push
> that
> > > > > complexity
> > > > > > > > into
> > > > > > > > > > a
> > > > > > > > > > > > > > >     compile time option for the daemon which
> would
> > > > > result in
> > > > > > > > > > different
> > > > > > > > > > > > > > >     binaries although a common source base.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   - expose a new kernel ABI to abstract the
> > > hypercall
> > > > > > > > > > differences away
> > > > > > > > > > > > > > >     in the guest kernel. In this case the
> > > userspace
> > > > > would
> > > > > > > > > > essentially
> > > > > > > > > > > > > > >     ask for an abstract "map guest N memory to
> > > > > userspace
> > > > > > > > ptr"
> > > > > > > > > > and let
> > > > > > > > > > > > > > >     the kernel deal with the different
> hypercall
> > > > > interfaces.
> > > > > > > > > > This of
> > > > > > > > > > > > > > >     course assumes the majority of BE guests
> would
> > > be
> > > > > Linux
> > > > > > > > > > kernels and
> > > > > > > > > > > > > > >     leaves the bare-metal/unikernel approaches
> to
> > > > > their own
> > > > > > > > > > devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Operation
> > > > > > > > > > > > > > > =========
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The core of the operation of VirtIO is fairly
> > > simple.
> > > > > Once
> > > > > > > > the
> > > > > > > > > > > > > > > vhost-user feature negotiation is done it's a
> case
> > > of
> > > > > > > > receiving
> > > > > > > > > > update
> > > > > > > > > > > > > > > events and parsing the resultant virt queue
> for
> > > data.
> > > > > The
> > > > > > > > vhost-
> > > > > > > > > > user
> > > > > > > > > > > > > > > specification handles a bunch of setup before
> that
> > > > > point,
> > > > > > > > mostly
> > > > > > > > > > to
> > > > > > > > > > > > > > > detail where the virt queues are set up FD's
> for
> > > > > memory and
> > > > > > > > > > event
> > > > > > > > > > > > > > > communication. This is where the envisioned
> stub
> > > > > process
> > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > > > > responsible for getting the daemon up and
> ready to
> > > run.
> > > > > This
> > > > > > > > is
> > > > > > > > > > > > > > > currently done inside a big VMM like QEMU but
> I
> > > > > suspect a
> > > > > > > > modern
> > > > > > > > > > > > > > > approach would be to use the rust-vmm vhost
> crate.
> > > It
> > > > > would
> > > > > > > > then
> > > > > > > > > > either
> > > > > > > > > > > > > > > communicate with the kernel's abstracted ABI
> or be
> > > re-
> > > > > > > > targeted
> > > > > > > > > > as a
> > > > > > > > > > > > > > > build option for the various hypervisors.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One thing I mentioned before to Alex is that Xen
> > > doesn't
> > > > > have
> > > > > > > > VMMs
> > > > > > > > > > the
> > > > > > > > > > > > > > way they are typically envisioned and described
> in
> > > other
> > > > > > > > > > environments.
> > > > > > > > > > > > > > Instead, Xen has IOREQ servers. Each of them
> > > connects
> > > > > > > > > > independently to
> > > > > > > > > > > > > > Xen via the IOREQ interface. E.g. today multiple
> > > QEMUs
> > > > > could
> > > > > > > > be
> > > > > > > > > > used as
> > > > > > > > > > > > > > emulators for a single Xen VM, each of them
> > > connecting
> > > > > to Xen
> > > > > > > > > > > > > > independently via the IOREQ interface.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The component responsible for starting a daemon
> > > and/or
> > > > > setting
> > > > > > > > up
> > > > > > > > > > shared
> > > > > > > > > > > > > > interfaces is the toolstack: the xl command and
> the
> > > > > > > > libxl/libxc
> > > > > > > > > > > > > > libraries.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think that VM configuration management (or
> > > orchestration
> > > > > in
> > > > > > > > > > Startos
> > > > > > > > > > > > > jargon?) is a subject to debate in parallel.
> > > > > > > > > > > > > Otherwise, is there any good assumption to avoid
> it
> > > right
> > > > > now?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Oleksandr and others I CCed have been working on
> > > ways
> > > > > for the
> > > > > > > > > > toolstack
> > > > > > > > > > > > > > to create virtio backends and setup memory
> mappings.
> > > > > They
> > > > > > > > might be
> > > > > > > > > > able
> > > > > > > > > > > > > > to provide more info on the subject. I do think
> we
> > > miss
> > > > > a way
> > > > > > > > to
> > > > > > > > > > provide
> > > > > > > > > > > > > > the configuration to the backend and anything
> else
> > > that
> > > > > the
> > > > > > > > > > backend
> > > > > > > > > > > > > > might require to start doing its job.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, some work has been done for the toolstack to
> handle
> > > > > Virtio
> > > > > > > > MMIO
> > > > > > > > > > devices in
> > > > > > > > > > > > general and Virtio block devices in particular.
> However,
> > > it
> > > > > has
> > > > > > > > not
> > > > > > > > > > been upstreaned yet.
> > > > > > > > > > > > Updated patches on review now:
> > > > > > > > > > > > https://lore.kernel.org/xen-devel/1621626361-29076-
> 1-
> > > git-
> > > > > send-
> > > > > > > > email-
> > > > > > > > > > olekstysh@xxxxxxxxx/
> > > > > > > > > > > >
> > > > > > > > > > > > There is an additional (also important) activity to
> > > > > improve/fix
> > > > > > > > > > foreign memory mapping on Arm which I am also involved
> in.
> > > > > > > > > > > > The foreign memory mapping is proposed to be used
> for
> > > Virtio
> > > > > > > > backends
> > > > > > > > > > (device emulators) if there is a need to run guest OS
> > > completely
> > > > > > > > > > unmodified.
> > > > > > > > > > > > Of course, the more secure way would be to use grant
> > > memory
> > > > > > > > mapping.
> > > > > > > > > > Brietly, the main difference between them is that with
> > > foreign
> > > > > mapping
> > > > > > > > the
> > > > > > > > > > backend
> > > > > > > > > > > > can map any guest memory it wants to map, but with
> grant
> > > > > mapping
> > > > > > > > it is
> > > > > > > > > > allowed to map only what was previously granted by the
> > > frontend.
> > > > > > > > > > > >
> > > > > > > > > > > > So, there might be a problem if we want to pre-map
> some
> > > > > guest
> > > > > > > > memory
> > > > > > > > > > in advance or to cache mappings in the backend in order
> to
> > > > > improve
> > > > > > > > > > performance (because the mapping/unmapping guest pages
> every
> > > > > request
> > > > > > > > > > requires a lot of back and forth to Xen + P2M updates).
> In a
> > > > > nutshell,
> > > > > > > > > > currently, in order to map a guest page into the backend
> > > address
> > > > > space
> > > > > > > > we
> > > > > > > > > > need to steal a real physical page from the backend
> domain.
> > > So,
> > > > > with
> > > > > > > > the
> > > > > > > > > > said optimizations we might end up with no free memory
> in
> > > the
> > > > > backend
> > > > > > > > > > domain (see XSA-300). And what we try to achieve is to
> not
> > > waste
> > > > > a
> > > > > > > > real
> > > > > > > > > > domain memory at all by providing safe non-allocated-yet
> (so
> > > > > unused)
> > > > > > > > > > address space for the foreign (and grant) pages to be
> mapped
> > > > > into,
> > > > > > > > this
> > > > > > > > > > enabling work implies Xen and Linux (and likely DTB
> bindings)
> > > > > changes.
> > > > > > > > > > However, as it turned out, for this to work in a proper
> and
> > > safe
> > > > > way
> > > > > > > > some
> > > > > > > > > > prereq work needs to be done.
> > > > > > > > > > > > You can find the related Xen discussion at:
> > > > > > > > > > > > https://lore.kernel.org/xen-devel/1627489110-25633-
> 1-
> > > git-
> > > > > send-
> > > > > > > > email-
> > > > > > > > > > olekstysh@xxxxxxxxx/
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One question is how to best handle
> notification
> > > and
> > > > > kicks.
> > > > > > > > The
> > > > > > > > > > existing
> > > > > > > > > > > > > > > vhost-user framework uses eventfd to signal
> the
> > > daemon
> > > > > > > > (although
> > > > > > > > > > QEMU
> > > > > > > > > > > > > > > is quite capable of simulating them when you
> use
> > > TCG).
> > > > > Xen
> > > > > > > > has
> > > > > > > > > > it's own
> > > > > > > > > > > > > > > IOREQ mechanism. However latency is an
> important
> > > > > factor and
> > > > > > > > > > having
> > > > > > > > > > > > > > > events go through the stub would add quite a
> lot.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yeah I think, regardless of anything else, we
> want
> > > the
> > > > > > > > backends to
> > > > > > > > > > > > > > connect directly to the Xen hypervisor.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In my approach,
> > > > > > > > > > > > >  a) BE -> FE: interrupts triggered by BE calling a
> > > > > hypervisor
> > > > > > > > > > interface
> > > > > > > > > > > > >               via virtio-proxy
> > > > > > > > > > > > >  b) FE -> BE: MMIO to config raises events (in
> event
> > > > > channels),
> > > > > > > > > > which is
> > > > > > > > > > > > >               converted to a callback to BE via
> > > virtio-
> > > > > proxy
> > > > > > > > > > > > >               (Xen's event channel is internnally
> > > > > implemented by
> > > > > > > > > > interrupts.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't know what "connect directly" means here,
> but
> > > > > sending
> > > > > > > > > > interrupts
> > > > > > > > > > > > > to the opposite side would be best efficient.
> > > > > > > > > > > > > Ivshmem, I suppose, takes this approach by
> utilizing
> > > PCI's
> > > > > msi-x
> > > > > > > > > > mechanism.
> > > > > > > > > > > >
> > > > > > > > > > > > Agree that MSI would be more efficient than SPI...
> > > > > > > > > > > > At the moment, in order to notify the frontend, the
> > > backend
> > > > > issues
> > > > > > > > a
> > > > > > > > > > specific device-model call to query Xen to inject a
> > > > > corresponding SPI
> > > > > > > > to
> > > > > > > > > > the guest.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Could we consider the kernel internally
> converting
> > > > > IOREQ
> > > > > > > > > > messages from
> > > > > > > > > > > > > > > the Xen hypervisor to eventfd events? Would
> this
> > > scale
> > > > > with
> > > > > > > > > > other kernel
> > > > > > > > > > > > > > > hypercall interfaces?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So any thoughts on what directions are worth
> > > > > experimenting
> > > > > > > > with?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One option we should consider is for each
> backend to
> > > > > connect
> > > > > > > > to
> > > > > > > > > > Xen via
> > > > > > > > > > > > > > the IOREQ interface. We could generalize the
> IOREQ
> > > > > interface
> > > > > > > > and
> > > > > > > > > > make it
> > > > > > > > > > > > > > hypervisor agnostic. The interface is really
> trivial
> > > and
> > > > > easy
> > > > > > > > to
> > > > > > > > > > add.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As I said above, my proposal does the same thing
> that
> > > you
> > > > > > > > mentioned
> > > > > > > > > > here :)
> > > > > > > > > > > > > The difference is that I do call hypervisor
> interfaces
> > > via
> > > > > > > > virtio-
> > > > > > > > > > proxy.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The only Xen-specific part is the notification
> > > mechanism,
> > > > > > > > which is
> > > > > > > > > > an
> > > > > > > > > > > > > > event channel. If we replaced the event channel
> with
> > > > > something
> > > > > > > > > > else the
> > > > > > > > > > > > > > interface would be generic. See:
> > > > > > > > > > > > > > https://gitlab.com/xen-project/xen/-
> > > > > > > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think that translating IOREQs to eventfd
> in
> > > the
> > > > > kernel
> > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > > > > > good idea: if feels like it would be extra
> > > complexity
> > > > > and that
> > > > > > > > the
> > > > > > > > > > > > > > kernel shouldn't be involved as this is a
> backend-
> > > > > hypervisor
> > > > > > > > > > interface.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Given that we may want to implement BE as a bare-
> metal
> > > > > > > > application
> > > > > > > > > > > > > as I did on Zephyr, I don't think that the
> translation
> > > > > would not
> > > > > > > > be
> > > > > > > > > > > > > a big issue, especially on RTOS's.
> > > > > > > > > > > > > It will be some kind of abstraction layer of
> interrupt
> > > > > handling
> > > > > > > > > > > > > (or nothing but a callback mechanism).
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Also, eventfd is very Linux-centric and we are
> > > trying to
> > > > > > > > design an
> > > > > > > > > > > > > > interface that could work well for RTOSes too.
> If we
> > > > > want to
> > > > > > > > do
> > > > > > > > > > > > > > something different, both OS-agnostic and
> > > hypervisor-
> > > > > agnostic,
> > > > > > > > > > perhaps
> > > > > > > > > > > > > > we could design a new interface. One that could
> be
> > > > > > > > implementable
> > > > > > > > > > in the
> > > > > > > > > > > > > > Xen hypervisor itself (like IOREQ) and of course
> any
> > > > > other
> > > > > > > > > > hypervisor
> > > > > > > > > > > > > > too.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > There is also another problem. IOREQ is probably
> not
> > > be
> > > > > the
> > > > > > > > only
> > > > > > > > > > > > > > interface needed. Have a look at
> > > > > > > > > > > > > > https://marc.info/?l=xen-
> devel&m=162373754705233&w=2.
> > > > > Don't we
> > > > > > > > > > also need
> > > > > > > > > > > > > > an interface for the backend to inject
> interrupts
> > > into
> > > > > the
> > > > > > > > > > frontend? And
> > > > > > > > > > > > > > if the backend requires dynamic memory mappings
> of
> > > > > frontend
> > > > > > > > pages,
> > > > > > > > > > then
> > > > > > > > > > > > > > we would also need an interface to map/unmap
> domU
> > > pages.
> > > > > > > > > > > > >
> > > > > > > > > > > > > My proposal document might help here; All the
> > > interfaces
> > > > > > > > required
> > > > > > > > > > for
> > > > > > > > > > > > > virtio-proxy (or hypervisor-related interfaces)
> are
> > > listed
> > > > > as
> > > > > > > > > > > > > RPC protocols :)
> > > > > > > > > > > > >
> > > > > > > > > > > > > > These interfaces are a lot more problematic than
> > > IOREQ:
> > > > > IOREQ
> > > > > > > > is
> > > > > > > > > > tiny
> > > > > > > > > > > > > > and self-contained. It is easy to add anywhere.
> A
> > > new
> > > > > > > > interface to
> > > > > > > > > > > > > > inject interrupts or map pages is more difficult
> to
> > > > > manage
> > > > > > > > because
> > > > > > > > > > it
> > > > > > > > > > > > > > would require changes scattered across the
> various
> > > > > emulators.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Exactly. I have no confident yet that my approach
> will
> > > > > also
> > > > > > > > apply
> > > > > > > > > > > > > to other hypervisors than Xen.
> > > > > > > > > > > > > Technically, yes, but whether people can accept it
> or
> > > not
> > > > > is a
> > > > > > > > > > different
> > > > > > > > > > > > > matter.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > -Takahiro Akashi
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Oleksandr Tyshchenko


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.