[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SR-IOV: do we need to virtualize in Xen or rely on Dom0?

  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 10 Jun 2021 15:33:22 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZSjApoFD0gmo8JADzxrc3DbSryHX8BK83gTeR0Vn8Jo=; b=e3EhKj7Nu/gG71xkWWJg4LKqmwQWQl+1kTuciOu3rJt89CnDswv08mKixDIMI8TrgCYe66iBeiY9G4vSKu3QBfJqGViJKnPQ+p8pgfWc9svy46p0gkPxC3HurHthmbwHwLoXx7hQAvMIDNtqv0cutvzfdlUrXA1on1ekjxTU+dqe3E7F9vVqAPOjbyn4oER5LT508CJcgwrDgcjTSQnfsSoDgME55LUB29B2tO1sZM06kSpftnMLD5+Eb8hP3qp9/qGzE+eNjFAN2tks6x5OKVFfLGFgbl7H+D0jqeBQqVQAvb8dEyavE9Dn+N7hX131yOHNkPoqCwbUGjVlQ0NITw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=l1/gRDqDlriGBVRGYmEJZrP3wLIVV/CEbFPn7FTBPXItoPwjSDxjfqqCI199QnvYn6G5knjk4bDqrXg8a/bMD4q2stTPuDP/KH/qHMpyV3fdWP5bGoZ6xIyah4HVhyy+AjKNK1JmQ7F+b7iSHfCNH+WHj+UNouRZQAn8BmORFJ6zklPdDAlcskhT56ecldl4P+RmIiJcKN/0EHyuc/2KItNDfTwZvtO9FFC1EstujmjAEFKjXb688eSfDoymOIuzw76v4ey3TKMBABhx+iauGRs8UQnuDkBH2odhcyWSrIRJ+t3M+qR4NOdDyGTo2np2hAiDaWy55F6TQD0c7EY8NA==
  • Authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=epam.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>
  • Delivery-date: Thu, 10 Jun 2021 15:33:44 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXWQwW/O3obtZgqkmpGRvt7ZNDkasM6aMAgAAjXoCAAEW+AIAAFw0A
  • Thread-topic: SR-IOV: do we need to virtualize in Xen or rely on Dom0?

On 10.06.21 17:10, Roger Pau Monné wrote:
> On Thu, Jun 10, 2021 at 10:01:16AM +0000, Oleksandr Andrushchenko wrote:
>> Hello, Roger!
>> On 10.06.21 10:54, Roger Pau Monné wrote:
>>> On Fri, Jun 04, 2021 at 06:37:27AM +0000, Oleksandr Andrushchenko wrote:
>>>> Hi, all!
>>>> While working on PCI SR-IOV support for ARM I started porting [1] on top
>>>> of current PCI on ARM support [2]. The question I have for this series
>>>> is if we really need emulating SR-IOV code in Xen?
>>>> I have implemented a PoC for SR-IOV on ARM [3] (please see the top 2
>>>> patches)
>>>> and it "works for me": MSI support is still WIP, but I was able to see that
>>>> VFs are properly seen in the guest and BARs are properly programmed in p2m.
>>>> What I can't fully understand is if we can live with this approach or there
>>>> are use-cases I can't see.
>>>> Previously I've been told that this approach might not work on FreeBSD
>>>> running
>>>> as Domain-0, but is seems that "PCI Passthrough is not supported
>>>> (Xen/FreeBSD)"
>>>> anyways [4].
>>> PCI passthorgh is not supported on FreeBSD dom0 because PCI
>>> passthrough is not supported by Xen itself when using a PVH dom0, and
>>> that's the only mode FreeBSD dom0 can use.
>> So, it is still not clear to me: how and if PCI passthrough is supported
>> on FreeBSD, what are the scenarios and requirements for that?
>>> PHYSDEVOP_pci_device_add can be added to FreeBSD, so it could be made
>>> to work. I however think this is not the proper way to implement
>>> SR-IOV support.
>> I was not able to find any support for PHYSDEVOP_XXX in FreeBSD code,
>> could you please point me to where are these used?
> Those are not used on FreeBSD, because x86 PVHv2 dom0 doesn't
> implement them anymore. They are implemented on Linux for x86 PV dom0,
> AFAIK Arm doesn't use them either.

Well, ARM didn't until we started implementing PCI passthrough [1].

It was previously discussed [2], "# Discovering PCI devices:" and proposed

to use PHYSDEVOP_pci_device_add.

Long story short, it is not easy for ARM to enumerate PCI devices in Xen as 
there is

no unified way of doing so: different platforms implement different PCI host 

which require complex initialization including clocks, power domains etc.

It was also discussed that PCI on ARM would want to support dom0less (DomB) 

so we should have some bootloader which will enumerate PCI devices for Xen 

and Xen will only support ECAM-based host bridges.

Anyways, as the above does not exist yet, we use PHYSDEVOP_pci_device_add on 

And we rely on Dom0 to initialize PCI host bridge, so Xen can also access PCI.

>> If they are not, so how Xen under FreeBSD knows about PCI devices?
> Xen scans the PCI bus itself, see scan_pci_devices.
See above, this is not yet available on ARM
>> I am trying to extrapolate my knowledge of how Linux does that
>> (during PCI enumeration in Domain-0 we use hypercalls)
>>>> I also see ACRN hypervisor [5] implements SR-IOV inside it which makes
>>>> me think I
>>>> miss some important use-case on x86 though.
>>>> I would like to ask for any advise with SR-IOV in hypervisor respect,
>>>> any pointers
>>>> to documentation or any other source which might be handy in deciding if
>>>> we do
>>>> need SR-IOV complexity in Xen.
>>>> And it does bring complexity if you compare [1] and [3])...
>>>> A bit of technical details on the approach implemented [3]:
>>>> 1. We rely on PHYSDEVOP_pci_device_add
>>>> 2. We rely on Domain-0 SR-IOV drivers to instantiate VFs
>>>> 3. BARs are programmed in p2m implementing guest view on those (we have
>>>> extended
>>>> vPCI code for that and this path is used for both "normal" devices and
>>>> VFs the same way)
>>>> 4. No need to trap PCI_SRIOV_CTRL
>>>> 5. No need to wait 100ms in Xen before attempting to access VF registers
>>>> when
>>>> enabling virtual functions on the PF - this is handled by Domain-0 itself.
>>> I think the SR-IOV capability should be handled like any other PCI
>>> capability, ie: like we currently handle MSI or MSI-X in vPCI.
>>> It's likely that using some kind of hypercall in order to deal with
>>> SR-IOV could make this easier to implement in Xen, but that just adds
>>> more code to all OSes that want to run as the hardware domain.
>> I didn't introduce any new, but PHYSDEVOP_pci_device_add was enough.
> Well, that would be 'new' on x86 PVH or Arm, as they don't implement
> any PHYSDEVOP at the moment.
Agree for x86 PVH
> Long term we might need an hypercall to report dynamic MCFG regions,
> but I haven't got around to it yet (and haven't found any system that
> reports extra MCFG regions from ACPI AML).
Which means we'll need to modify guest OS
>> The rest I did in Xen itself wrt SR-IOV.
>>> OTOH if we properly trap accesses to the SR-IOV capability (like it
>>> was proposed in [1] from your references) we won't have to modify OSes
>>> that want to run as hardware domains in order to handle SR-IOV devices.
>> Out of curiosity, could you please name a few? I do understand that
>> we do want to support unmodified OSes and this is indeed important.
>> But, still what are the other OSes which do support Xen + PCI passthrough?
> NetBSD PV dom0 does support PCI passthrough, but I'm not sure that's
> relevant.

That was just for me to understand where to look for the PCI passthrough

implementations and not to break something which I don't see

> We shouldn't focus on current users to come up with an interface,
> but rather think how we want that interface to be.
> As I said on the previous email my opinion is that unless not
> technically possible we should just trap accesses to the SR-IOV
> capability like we do for MSI(-X) and handle it transparently from a
> guest PoV.

Ok, I understand. It seems that Jan also supports your idea. So, I am not

against that, just trying to see the whole picture which is a bit bigger than 

>>> IMO going for the hypercall option seems easier now, but adds a burden
>>> to all OSes that want to manage SR-IOV devices that will hurt us long
>>> term.
>> Again, I was able to make it somewhat work with PHYSDEVOP_pci_device_add 
>> only.
> Sure, that's how it works on x86 PV hardware domain, so it's certainly
> possible. My comments to avoid that route is not because it's not
> technically feasible, but because I don't like the approach.

Unless we have some unified way of accessing PCI on ARM I am not sure

we can live without PHYSDEVOP_pci_device_add hypercall.

> So far we have avoided PVH from having to implement any PHYSDEVOP
> hypercall, and that's a design decision, not a coincidence. I'm in
> favor of using the existing hardware interfaces for guests instead of
> introducing custom Xen ones when technically feasible.

Unfortunately, on ARM (and I believe it may also happen on other

non-x86 platforms) there are new obstacles to this design. And if

we want Xen + PCI be supported on other than x86 platforms we have

to re-think the existing approach to include others in the game.

> Thanks, Roger.

Thank you,



[2] https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxxxxxxxxx/msg77422.html



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.