[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends



On Mon, Sep 06, 2021 at 07:41:48PM -0700, Christopher Clark wrote:
> On Sun, Sep 5, 2021 at 7:24 PM AKASHI Takahiro via Stratos-dev <
> stratos-dev@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> > Alex,
> >
> > On Fri, Sep 03, 2021 at 10:28:06AM +0100, Alex Benn??e wrote:
> > >
> > > AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> writes:
> > >
> > > > Alex,
> > > >
> > > > On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Benn??e wrote:
> > > >>
> > > >> Stefan Hajnoczi <stefanha@xxxxxxxxxx> writes:
> > > >>
> > > >> > [[PGP Signed Part:Undecided]]
> > > >> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > > >> >> > Could we consider the kernel internally converting IOREQ
> > messages from
> > > >> >> > the Xen hypervisor to eventfd events? Would this scale with
> > other kernel
> > > >> >> > hypercall interfaces?
> > > >> >> >
> > > >> >> > So any thoughts on what directions are worth experimenting with?
> > > >> >>
> > > >> >> One option we should consider is for each backend to connect to
> > Xen via
> > > >> >> the IOREQ interface. We could generalize the IOREQ interface and
> > make it
> > > >> >> hypervisor agnostic. The interface is really trivial and easy to
> > add.
> > > >> >> The only Xen-specific part is the notification mechanism, which is
> > an
> > > >> >> event channel. If we replaced the event channel with something
> > else the
> > > >> >> interface would be generic. See:
> > > >> >>
> > https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > >> >
> > > >> > There have been experiments with something kind of similar in KVM
> > > >> > recently (see struct ioregionfd_cmd):
> > > >> >
> > https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@xxxxxxxxx/
> > > >>
> > > >> Reading the cover letter was very useful in showing how this provides
> > a
> > > >> separate channel for signalling IO events to userspace instead of
> > using
> > > >> the normal type-2 vmexit type event. I wonder how deeply tied the
> > > >> userspace facing side of this is to KVM? Could it provide a common FD
> > > >> type interface to IOREQ?
> > > >
> > > > Why do you stick to a "FD" type interface?
> > >
> > > I mean most user space interfaces on POSIX start with a file descriptor
> > > and the usual read/write semantics or a series of ioctls.
> >
> > Who do you assume is responsible for implementing this kind of
> > fd semantics, OSs on BE or hypervisor itself?
> >
> > I think such interfaces can only be easily implemented on type-2
> > hypervisors.
> >
> > # In this sense, I don't think rust-vmm, as it is, cannot be
> > # a general solution.
> >
> > > >> As I understand IOREQ this is currently a direct communication between
> > > >> userspace and the hypervisor using the existing Xen message bus. My
> > > >
> > > > With IOREQ server, IO event occurrences are notified to BE via Xen's
> > event
> > > > channel, while the actual contexts of IO events (see struct ioreq in
> > ioreq.h)
> > > > are put in a queue on a single shared memory page which is to be
> > assigned
> > > > beforehand with xenforeignmemory_map_resource hypervisor call.
> > >
> > > If we abstracted the IOREQ via the kernel interface you would probably
> > > just want to put the ioreq structure on a queue rather than expose the
> > > shared page to userspace.
> >
> > Where is that queue?
> >
> > > >> worry would be that by adding knowledge of what the underlying
> > > >> hypervisor is we'd end up with excess complexity in the kernel. For
> > one
> > > >> thing we certainly wouldn't want an API version dependency on the
> > kernel
> > > >> to understand which version of the Xen hypervisor it was running on.
> > > >
> > > > That's exactly what virtio-proxy in my proposal[1] does; All the
> > hypervisor-
> > > > specific details of IO event handlings are contained in virtio-proxy
> > > > and virtio BE will communicate with virtio-proxy through a virtqueue
> > > > (yes, virtio-proxy is seen as yet another virtio device on BE) and will
> > > > get IO event-related *RPC* callbacks, either MMIO read or write, from
> > > > virtio-proxy.
> > > >
> > > > See page 8 (protocol flow) and 10 (interfaces) in [1].
> > >
> > > There are two areas of concern with the proxy approach at the moment.
> > > The first is how the bootstrap of the virtio-proxy channel happens and
> >
> > As I said, from BE point of view, virtio-proxy would be seen
> > as yet another virtio device by which BE could talk to "virtio
> > proxy" vm or whatever else.
> >
> > This way we guarantee BE's hypervisor-agnosticism instead of having
> > "common" hypervisor interfaces. That is the base of my idea.
> >
> > > the second is how many context switches are involved in a transaction.
> > > Of course with all things there is a trade off. Things involving the
> > > very tightest latency would probably opt for a bare metal backend which
> > > I think would imply hypervisor knowledge in the backend binary.
> >
> > In configuration phase of virtio device, the latency won't be a big matter.
> > In device operations (i.e. read/write to block devices), if we can
> > resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue
> > is
> > how efficiently we can deliver notification to the opposite side. Right?
> > And this is a very common problem whatever approach we would take.
> >
> > Anyhow, if we do care the latency in my approach, most of virtio-proxy-
> > related code can be re-implemented just as a stub (or shim?) library
> > since the protocols are defined as RPCs.
> > In this case, however, we would lose the benefit of providing "single
> > binary"
> > BE.
> > (I know this is is an arguable requirement, though.)
> >
> > # Would we better discuss what "hypervisor-agnosticism" means?
> >
> Is there a call that you could recommend that we join to discuss this and
> the topics of this thread?

Stratos call?
Alex should have more to say.

-Takahiro Akashi


> There is definitely interest in pursuing a new interface for Argo that can
> be implemented in other hypervisors and enable guest binary portability
> between them, at least on the same hardware architecture, with VirtIO
> transport as a primary use case.
> 
> The notes from the Xen Summit Design Session on VirtIO Cross-Project BoF
> for Xen and Guest OS, which include context about the several separate
> approaches to VirtIO on Xen, have now been posted here:
> https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00472.html
> 
> Christopher
> 
> 
> 
> > -Takahiro Akashi
> >
> >
> >



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.