[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 18 Nov 2021 15:46:28 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tSyhJBEbCV8iyrnV17bf/LoDjHQIFOEnKl3ul0JENYQ=; b=S2qHi8EUDlXGZf1OmzeuoN9bxsydN2jxJIajDAxi4lH2AL0RUO+cmlFeR9LtPfnZbd8R5OxYtCephwt6asdiAdQVLfzFPPj3pdHGQTJZe+KoAb+Z1N70th4t9RajjGGbBFYDUZ1y5ACd6VkUhIaFO80ii6xdCNgiF3vrnT5pheddNc3s62EnmFOnlDWsQbofTt3I/rC2aAokw1cdWQNWRyLexg8T/JG0ThiBr38Hz1HEQLLyQGmcxAhBc2n/XNUU/PRo7IhkjDUR1ezihR6wxWfKhMBPqkMLYCPKQtAq7b8enw+U7qYOzrzpuM1yL5IiOMoG2YI4VDBCyqP9UT5sAA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=C2uKKcdcnfsqpz/bm6qbmLvmk1lh6/JxJDVmeE8DE7+FGPHlerxiK9PVSFLFzvBZkqDYc7DI9PSbXLwsQy/rCg/SWC32Yn8VeyH4TRIaz58znTxNlbtDRMuVOa7D/5U8wWOwy0lPi7ey54+RpJqKkxnsuwvwVZO4wmaVxfk21BE2Jd/HMlpg+OJ8d1EjZWvzuv2vM0jlWtLDNpxNo5g+JlXgv4Reco7NgtTP8EDQl3gMX4TOVi4W37A9p5FOJTG4D84iMOWVrpQ3A9FTUrZTmSSQQr84xSmcAL4BKZjUeo/cY01kGEODwwOzCCVNqImCgJI6m5DjUq+r7OfHTwitvA==
  • Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Thu, 18 Nov 2021 15:46:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHX0hJIYALl/D9fL0OD6N0XGDJh2awHdhMAgAGHTYCAAA1EAIAABOkAgAAF6YCAAATnAIAAQRQAgAAGPgCAAAR1gIAAAvIAgAAFxACAAAoGAIAAAXMAgAABgYCAAAVtAIAAAX+A
  • Thread-topic: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


On 18.11.21 17:41, Jan Beulich wrote:
> On 18.11.2021 16:21, Oleksandr Andrushchenko wrote:
>> On 18.11.21 17:16, Jan Beulich wrote:
>>> On 18.11.2021 16:11, Oleksandr Andrushchenko wrote:
>>>> On 18.11.21 16:35, Jan Beulich wrote:
>>>>> On 18.11.2021 15:14, Oleksandr Andrushchenko wrote:
>>>>>> On 18.11.21 16:04, Roger Pau Monné wrote:
>>>>>>> Indeed. In the physdevop failure case this comes from an hypercall
>>>>>>> context, so maybe you could do the mapping in place using hypercall
>>>>>>> continuations if required. Not sure how complex that would be,
>>>>>>> compared to just deferring to guest entry point and then dealing with
>>>>>>> the possible cleanup on failure.
>>>>>> This will solve one part of the equation:
>>>>>>
>>>>>> pci_physdev_op
>>>>>>            pci_add_device
>>>>>>                init_bars -> modify_bars -> defer_map -> 
>>>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>>>            iommu_add_device <- FAILS
>>>>>>            vpci_remove_device -> xfree(pdev->vpci)
>>>>>>
>>>>>> But what about the other one, e.g. vpci_process_pending is triggered in
>>>>>> parallel with PCI device de-assign for example?
>>>>> Well, that's again in hypercall context, so using hypercall continuations
>>>>> may again be an option. Of course at the point a de-assign is initiated,
>>>>> you "only" need to drain requests (for that device, but that's unlikely
>>>>> to be worthwhile optimizing for), while ensuring no new requests can be
>>>>> issued. Again, for the device in question, but here this is relevant -
>>>>> a flag may want setting to refuse all further requests. Or maybe the
>>>>> register handling hooks may want tearing down before draining pending
>>>>> BAR mapping requests; without the hooks in place no new such requests
>>>>> can possibly appear.
>>>> This can be probably even easier to solve as we were talking about
>>>> pausing all vCPUs:
>>> I have to admit I'm not sure. It might be easier, but it may also be
>>> less desirable.
>>>
>>>> void vpci_cancel_pending(const struct pci_dev *pdev)
>>>> {
>>>>        struct domain *d = pdev->domain;
>>>>        struct vcpu *v;
>>>>        int rc;
>>>>
>>>>        while ( (rc = domain_pause_except_self(d)) == -ERESTART )
>>>>            cpu_relax();
>>>>
>>>>        if ( rc )
>>>>            printk(XENLOG_G_ERR
>>>>                   "Failed to pause vCPUs while canceling vPCI map/unmap 
>>>> for %pp %pd: %d\n",
>>>>                   &pdev->sbdf, pdev->domain, rc);
>>>>
>>>>        for_each_vcpu ( d, v )
>>>>        {
>>>>            if ( v->vpci.map_pending && (v->vpci.pdev == pdev) )
>>>>
>>>> This will prevent all vCPUs to run, but current, thus making it impossible
>>>> to run vpci_process_pending in parallel with any hypercall.
>>>> So, even without locking in vpci_process_pending the above should
>>>> be enough.
>>>> The only concern here is that domain_pause_except_self may return
>>>> the error code we somehow need to handle...
>>> Not just this. The -ERESTART handling isn't appropriate this way
>>> either.
>> Are you talking about cpu_relax()?
> I'm talking about that spin-waiting loop as a whole.
>
>>>    For the moment I can't help thinking that draining would
>>> be preferable over canceling.
>> Given that cancellation is going to happen on error path or
>> on device de-assign/remove I think this can be acceptable.
>> Any reason why not?
> It would seem to me that the correctness of a draining approach is
> going to be easier to prove than that of a canceling one, where I
> expect races to be a bigger risk. Especially something that gets
> executed infrequently, if ever (error paths in particular), knowing
> things are well from testing isn't typically possible.
Could you please then give me a hint how to do that:
1. We have scheduled SOFTIRQ on vCPU0 and it is about to touch pdev->vpci
2. We have de-assign/remove on vCPU1

How do we drain that? Do you mean some atomic variable to be
used in vpci_process_pending to flag it is running and de-assign/remove
needs to wait and spinning checking that?
>
> Jan
>
>
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.