[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 5/7] vpci: add SR-IOV support for PVH Dom0


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Mykyta Poturai <Mykyta_Poturai@xxxxxxxx>
  • Date: Tue, 12 May 2026 07:32:20 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x11TnGOl04orQbuqqYgUo62byMZqkXSryoufrsNuqH0=; b=gfsjC8CZMG/c5DKBPJxsH07hhDFrDFqBqaZQWhrY3UBZfW8gIwI9/73o/LsaeVKk/PccTu5s5ZXRgkKa1e3F6hQna496HKiIH6+0mB7ylkKQZpFDlscWQGYu+Dw4llWbO6X9fVZi7vA4Pmg34PRQZzPniBnP86OYemYx9cPciJa+0AidX9opCBexE/k+hVJI0x4XC7OQMhpVh/RGSyPwTDCNHGgaKULPIk8xOeVjJfukYNMir9BJiSI34dVMX2UqEPAa6Man3wsW1esdd/yf5e2AAHDR8q9FNRZIC1eGehtT4xfHxn0Ttm2Dux5Ash7xLKPYV99Sca3JGFCef8DEYg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Q2bT+ramku/bOZvmMW0cIyU9Rn6hd7T13T2g6TIPBfOrurHQhLlXUCjjI4/DAchtBP5468XRkoOt/eYIAeoDI9I/T+GLDqZFNqYNMQdjaxzt2dY4N0v4mudpJ/zpnlTWFZUYDQ/vwOMcLETU9v2sVfjRw5KQmo3w9AHQVvsAAUlwLT4C7W7byYg+EFIWJaO7Zlm+Rhsgc2TMqK44ntstdIpwOB0TvNu32X9tx0NUxC/XMVkTEqnn22lARatAbU9LzNbcGtfmkUi/O8vdHLI1O/HZ74EDlkMPWxGaCECYChF25EFuF8+g1FykpP+t7vQTDRipw4TuWzvz5dNM4585dw==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=epam.com header.i="@epam.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:x-ms-exchange-senderadcheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=epam.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Delivery-date: Tue, 12 May 2026 07:32:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHcyClfuk/Yqk/9/E2oGahG1JP+srYKHnIAgAAUKQA=
  • Thread-topic: [PATCH v3 5/7] vpci: add SR-IOV support for PVH Dom0


On 5/12/26 09:20, Jan Beulich wrote:
> On 11.05.2026 16:10, Volodymyr Babchuk wrote:
>> Hi Jan,
>>
>> Jan Beulich <jbeulich@xxxxxxxx> writes:
>>
>>> On 07.05.2026 22:40, Volodymyr Babchuk wrote:
>>>> Jan Beulich <jbeulich@xxxxxxxx> writes:
>>>>> On 06.05.2026 11:39, Mykyta Poturai wrote:
>>>>>> On 5/4/26 08:37, Jan Beulich wrote:
>>>>>>> On 23.04.2026 12:12, Mykyta Poturai wrote:
>>>>>>>> On 4/21/26 17:43, Jan Beulich wrote:
>>>>>>>>> On 09.04.2026 16:01, Mykyta Poturai wrote:
>>>>>>>>>> From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
>>>>>>>>>>
>>>>>>>>>> This code is expected to only be used by privileged domains,
>>>>>>>>>> unprivileged domains should not get access to the SR-IOV capability.
>>>>>>>>>>
>>>>>>>>>> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
>>>>>>>>>> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to 
>>>>>>>>>> account
>>>>>>>>>> for possible changes in the system page size register. Also force 
>>>>>>>>>> VFs to
>>>>>>>>>> always use emulated reads for command register, this is needed to
>>>>>>>>>> prevent some drivers accidentally unmapping BARs.
>>>>>>>>>
>>>>>>>>> This apparently refers to the change to vpci_init_header(). Writes are
>>>>>>>>> already intercepted. How would a read lead to accidental BAR unmap? 
>>>>>>>>> Even
>>>>>>>>> for writes I don't see how a VF driver could accidentally unmap BARs, 
>>>>>>>>> as
>>>>>>>>> the memory decode bit there is hardwired to 0.
>>>>>>>>>
>>>>>>>>>> Discovery of VFs is
>>>>>>>>>> done by Dom0, which must register them with Xen.
>>>>>>>>>
>>>>>>>>> If we intercept control register writes, why would we still require
>>>>>>>>> Dom0 to report the VFs that appear?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry, I don't understand this question. You specifically requested 
>>>>>>>> this
>>>>>>>> to be done this way in V2. Quoting your reply from V2 below.
>>>>>>>>
>>>>>>>>    > Aren't you effectively busy-waiting for these 100ms, by simply
>>>>>>>> returning "true"
>>>>>>>>    > from vpci_process_pending() until the time has passed? This imo 
>>>>>>>> is a
>>>>>>>> no-go. You
>>>>>>>>    > want to set a timer and put the vCPU to sleep, to wake it up again
>>>>>>>> when the
>>>>>>>>    > timer has expired. That'll then eliminate the need for the
>>>>>>>> not-so-nice patch 4.
>>>>>>>>
>>>>>>>>    > Question is whether we need to actually go this far (right away). 
>>>>>>>> I
>>>>>>>> expect you
>>>>>>>>    > don't mean to hand PFs to DomU-s. As long as we keep them in the 
>>>>>>>> hardware
>>>>>>>>    > domain, can't we trust it to set things up correctly, just like we
>>>>>>>> trust it in
>>>>>>>>    > a number of other aspects?
>>>>>>>
>>>>>>> How's any of this related to the question I raised here, or your reply
>>>>>>> thereto? If we intercept PCI_SRIOV_CTRL, we know when VFs are created.
>>>>>>> Why still demand Dom0 to report them then?
>>>>>>>
>>>>>>
>>>>>> The spec states that VFs can take up to 100ms after the VF_ENABLE bit is
>>>>>> set to become alive. We discussed in the V2 that it is not acceptable to
>>>>>> do a required 100ms wait in Xen while blocking a domain. And not doing
>>>>>> that blocking would require some mechanism to only allow a domain to run
>>>>>> for precisely 99(or more?)ms. You yourself suggested that we can trust
>>>>>> the hardware domain with registering VFs if we already trust it with
>>>>>> other PCI-related stuff. Did you change your mind, or am I completely
>>>>>> misunderstanding this question?
>>>>>
>>>>> No, I still think that we can trust hwdom enough. Nevertheless we should
>>>>> aim at being independent of it where possible. And I seem to recall that
>>>>> I had also outlined an approach how to avoid spin-waiting for 100ms in
>>>>> the hypervisor.
>>>>
>>>> I want to clarify: you are telling that Xen should not wait for hwdom to
>>>> report VFs and instead create them by itself. Is this correct?
>>>
>>> If that's technically possible, yes.
>>
>> Okay, so let's clear this. If I remember correct, you discussed this
>> with Mykyta in the previous version and suggested to put the vCPU to
>> sleep for 100ms.
> 
> I don't think I did (except perhaps from a very abstract perspective),
> precisely because of ...
> 
>> I don't think that this is a good idea, because guest
>> kernel will not be happy about that.
> 
> ... this. Instead iirc I suggested to refuse (short-circuit) handling
> VF register accesses for the next 100ms.
> 
> Jan

Do you have any suggestions on how to ensure that we accurately catch 
the window where 100ms have already passed, but guests haven’t tried to 
read anything yet, to flip this back? As I mentioned in the previous 
version, Linux, for example, doesn’t attempt to re-read anything if the 
first read failed after 100ms. So it appears to me that this approach 
would be prone to racing with the guest for getting to the VF first. One 
approach I can think of is to somehow swap the register handlers back 
in-flight during the first read by the guest if 100ms have already 
passed. However, this would still depend on Dom0 for registering VFs, 
but in a more convoluted way. We also can’t add the VFs before 100ms 
have passed and add timing checks to all register handlers, because 
pci_add_device and everything below it expects the device to be 
functional at the moment of addition.



Maybe you see some other way to avoid these problems that I am missing?

-- 
Mykyta

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.