[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 19 Nov 2021 14:25:32 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8Anzg6xKa+MrZlMyg8UVUjQjS0oAGtXUcqbEBmuEWug=; b=kfHZ6zq/vPosgmQFppcnIGOrjFdrdXdyaa6ByrNVDYFTVtabhijmN4eVt0X5gL22KIV0bKImM2J45/D7OwIxCPXdxfA43vdq3+QCEULgeX2qzClpuPHUo1+p7wLgaVRyb8SzD2jGXsv56hR2eLKTfOkWVSeYb3k+DQ9HsEfjQE/ChJopKFrP5D8s6QtdY0YwtT2pYbIePgoAZVN10tHYMXSQcykvNvEsfEpYRADvd61MmPkTy49/bTjYvP4CVEcS0Z/+BCX2cEnVL79IO2g8iZ6ZilyELDSosa63wGhcdqfMNJg34XVOP4/z1pkefMkvvuncsni9NjvOTHclvlyefg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MSZJJCNt1I66P88bqYmGpMCT7J/P/GPh9+D35QivG6mCQb18OToKwvUvTUhBR1yo0EdSZy8yKt4Xw9qv7LY55zZ2VNNynvE/EpGJU4GrH3DWTexOaaoB61zMORyfV4kjiLEg+tytfFP/OWe2SqCk/r7qY/1udn+Od0WIX0an4ZFb0V/5ylsAJ+TdEFmiCSHlyMzbRhY/+ltqjSwxQpNmpoLIRn2yGDsT/3YMzpdzqBd/Q+9hIH8renHCq+9kNT9kAQ/IxstnQj/mjoy2AWEgxaDF2rSzyOl3uCI6ZGAWE6JyPM2XLFfrZGDsOrnZTLSvG8CzORZayY8cqMQP6pRnxA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Fri, 19 Nov 2021 13:25:53 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
> On 19.11.21 15:00, Jan Beulich wrote:
>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>> Possible locking and other work needed:
>>> =======================================
>>>
>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device
>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>> index 28146ee404e6..ebf071893b21 100644
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -444,6 +444,7 @@ struct domain
>>>
>>>    #ifdef CONFIG_HAS_PCI
>>>        struct list_head pdev_list;
>>> +    rwlock_t vpci_rwlock;
>>> +    bool vpci_terminating; <- atomic?
>>>    #endif
>>> then vpci_remove_device is a writer (cold path) and vpci_process_pending and
>>> vpci_mmio_{read|write} are readers (hot path).
>> Right - you need such a lock for other purposes anyway, as per the
>> discussion with Julien.
> What about bool vpci_terminating? Do you see it as an atomic type or just 
> bool?

Having seen only ...

>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need 
>>> hypercall_create_continuation
>>> to be implemented, so when re-start removal if need be:
>>>
>>> vpci_remove_device()
>>> {
>>>     d->vpci_terminating = true;

... this use so far, I can't tell yet. But at a first glance a boolean
looks to be what you need.

>>>     remove vPCI register handlers <- this will cut off PCI_COMMAND 
>>> emulation among others
>>>     if ( !write_trylock(d->vpci_rwlock) )
>>>       return -ERESTART;
>>>     xfree(pdev->vpci);
>>>     pdev->vpci = NULL;
>>> }
>>>
>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>> other operations which may require it, e.g. virtual bus topology can
>>> use it when assigning vSBDF etc.
>>>
>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>> and do nothing for Dom0 and crash DomU otherwise:
>> Why is this? I'm not outright opposed, but I don't immediately see why
>> trying to remove the problematic device wouldn't be a reasonable course
>> of action anymore. vpci_remove_device() may need to become more careful
>> as to not crashing,
> vpci_remove_device does not crash, vpci_process_pending does
>>   though.
> Assume we are in an error state in vpci_process_pending *on one of the vCPUs*
> and we call vpci_remove_device. vpci_remove_device tries to acquire the
> lock and it can't just because there are some other vpci code is running on 
> other vCPU.
> Then what do we do here? We are in SoftIRQ context now and we can't spin
> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci
> structure because it is seen by all vCPUs and may crash them.
> 
> If vpci_remove_device is in hypercall context it just returns -ERESTART and
> hypercall continuation helps here. But not in SoftIRQ context.

Maybe then you want to invoke this cleanup from RCU context (whether
vpci_remove_device() itself or a suitable clone there of is TBD)? (I
will admit though that I didn't check whether that would satisfy all
constraints.)

Then again it also hasn't become clear to me why you use write_trylock()
there. The lock contention you describe doesn't, on the surface, look
any different from situations elsewhere.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.