|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
Hi, Roger!
On 12.01.22 17:15, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:44PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko<oleksandr_andrushchenko@xxxxxxxx>
>>
>> Instead of handling a single range set, that contains all the memory
>> regions of all the BARs and ROM, have them per BAR.
>> As the range sets are now created when a PCI device is added and destroyed
>> when it is removed so make them named and accounted.
>>
>> Note that rangesets were chosen here despite there being only up to
>> 3 separate ranges in each set (typically just 1). But rangeset per BAR
>> was chosen for the ease of implementation and existing code re-usability.
>>
>> This is in preparation of making non-identity mappings in p2m for the
>> MMIOs/ROM.
> I think we don't want to support ROM for guests (at least initially),
> so no need to mention it here.
Will add
>> Signed-off-by: Oleksandr Andrushchenko<oleksandr_andrushchenko@xxxxxxxx>
>>
>> ---
>> Since v4:
>> - use named range sets for BARs (Jan)
>> - changes required by the new locking scheme
>> - updated commit message (Jan)
>> Since v3:
>> - re-work vpci_cancel_pending accordingly to the per-BAR handling
>> - s/num_mem_ranges/map_pending and s/uint8_t/bool
>> - ASSERT(bar->mem) in modify_bars
>> - create and destroy the rangesets on add/remove
>> ---
>> xen/drivers/vpci/header.c | 190 +++++++++++++++++++++++++++-----------
>> xen/drivers/vpci/vpci.c | 30 +++++-
>> xen/include/xen/vpci.h | 3 +-
>> 3 files changed, 166 insertions(+), 57 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 8880d34ebf8e..cc49aa68886f 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -137,45 +137,86 @@ bool vpci_process_pending(struct vcpu *v)
>> return false;
>>
>> spin_lock(&pdev->vpci_lock);
>> - if ( !pdev->vpci_cancel_pending && v->vpci.mem )
>> + if ( !pdev->vpci )
>> + {
>> + spin_unlock(&pdev->vpci_lock);
>> + return false;
>> + }
>> +
>> + if ( !pdev->vpci_cancel_pending && v->vpci.map_pending )
>> {
>> struct map_data data = {
>> .d = v->domain,
>> .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>> };
>> - int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>> + struct vpci_header *header = &pdev->vpci->header;
>> + unsigned int i;
>>
>> - if ( rc == -ERESTART )
>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> {
>> - spin_unlock(&pdev->vpci_lock);
>> - return true;
>> - }
>> + struct vpci_bar *bar = &header->bars[i];
>> + int rc;
>> +
> You should check bar->mem != NULL here, there's no need to allocate a
> rangeset for non-mappable BARs.
Answered by Jan already: no need as rangeset_is_empty already handles
NULL pointer
>> + if ( rangeset_is_empty(bar->mem) )
>> + continue;
>> +
>> + rc = rangeset_consume_ranges(bar->mem, map_range, &data);
>> +
>> + if ( rc == -ERESTART )
>> + {
>> + spin_unlock(&pdev->vpci_lock);
>> + return true;
>> + }
>>
>> - if ( pdev->vpci )
>> /* Disable memory decoding unconditionally on failure. */
>> - modify_decoding(pdev,
>> - rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY :
>> v->vpci.cmd,
>> + modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY :
>> v->vpci.cmd,
> The above seems to be an unrelated change, and also exceeds the max
> line length.
Sure, will try to fit
>> !rc && v->vpci.rom_only);
>>
>> - if ( rc )
>> - {
>> - /*
>> - * FIXME: in case of failure remove the device from the domain.
>> - * Note that there might still be leftover mappings. While this
>> is
>> - * safe for Dom0, for DomUs the domain needs to be killed in
>> order
>> - * to avoid leaking stale p2m mappings on failure.
>> - */
>> - if ( is_hardware_domain(v->domain) )
>> - vpci_remove_device_locked(pdev);
>> - else
>> - domain_crash(v->domain);
>> + if ( rc )
>> + {
>> + /*
>> + * FIXME: in case of failure remove the device from the
>> domain.
>> + * Note that there might still be leftover mappings. While
>> this is
>> + * safe for Dom0, for DomUs the domain needs to be killed
>> in order
>> + * to avoid leaking stale p2m mappings on failure.
>> + */
>> + if ( is_hardware_domain(v->domain) )
>> + vpci_remove_device_locked(pdev);
>> + else
>> + domain_crash(v->domain);
>> +
>> + break;
>> + }
>> }
>> +
>> + v->vpci.map_pending = false;
>> }
>> spin_unlock(&pdev->vpci_lock);
>>
>> return false;
>> }
>>
>> +static void vpci_bar_remove_ranges(const struct pci_dev *pdev)
>> +{
>> + struct vpci_header *header = &pdev->vpci->header;
>> + unsigned int i;
>> + int rc;
>> +
>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> + {
>> + struct vpci_bar *bar = &header->bars[i];
>> +
>> + if ( rangeset_is_empty(bar->mem) )
>> + continue;
>> +
>> + rc = rangeset_remove_range(bar->mem, 0, ~0ULL);
> Might be interesting to introduce a rangeset_reset function that
> removes all ranges. That would never fail, and thus there would be no
> need to check for rc.
Well, there is a single user of that as of now, so not sure it is worth it yet
And if we re-allocate pdev->vpci then there might be no need for this
at all
> Also I think the current rangeset_remove_range should never fail when
> removing all ranges, as there's nothing to allocate.
Agree
> Hence you can add
> an ASSERT_UNREACHABLE below.
>> + if ( !rc )
>> + printk(XENLOG_ERR
>> + "%pd %pp failed to remove range set for BAR: %d\n",
>> + pdev->domain, &pdev->sbdf, rc);
>> + }
>> +}
>> +
>> void vpci_cancel_pending_locked(struct pci_dev *pdev)
>> {
>> struct vcpu *v;
>> @@ -185,23 +226,33 @@ void vpci_cancel_pending_locked(struct pci_dev *pdev)
>> /* Cancel any pending work now on all vCPUs. */
>> for_each_vcpu( pdev->domain, v )
>> {
>> - if ( v->vpci.mem && (v->vpci.pdev == pdev) )
>> + if ( v->vpci.map_pending && (v->vpci.pdev == pdev) )
>> {
>> - rangeset_destroy(v->vpci.mem);
>> - v->vpci.mem = NULL;
>> + vpci_bar_remove_ranges(pdev);
>> + v->vpci.map_pending = false;
>> }
>> }
>> }
>>
>> static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>> - struct rangeset *mem, uint16_t cmd)
>> + uint16_t cmd)
>> {
>> struct map_data data = { .d = d, .map = true };
>> - int rc;
>> + struct vpci_header *header = &pdev->vpci->header;
>> + int rc = 0;
>> + unsigned int i;
>> +
>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> + {
>> + struct vpci_bar *bar = &header->bars[i];
>>
>> - while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) ==
>> -ERESTART )
>> - process_pending_softirqs();
>> - rangeset_destroy(mem);
>> + if ( rangeset_is_empty(bar->mem) )
>> + continue;
>> +
>> + while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
>> + &data)) == -ERESTART )
>> + process_pending_softirqs();
>> + }
>> if ( !rc )
>> modify_decoding(pdev, cmd, false);
>>
>> @@ -209,7 +260,7 @@ static int __init apply_map(struct domain *d, const
>> struct pci_dev *pdev,
>> }
>>
>> static void defer_map(struct domain *d, struct pci_dev *pdev,
>> - struct rangeset *mem, uint16_t cmd, bool rom_only)
>> + uint16_t cmd, bool rom_only)
>> {
>> struct vcpu *curr = current;
>>
>> @@ -220,7 +271,7 @@ static void defer_map(struct domain *d, struct pci_dev
>> *pdev,
>> * started for the same device if the domain is not well-behaved.
>> */
>> curr->vpci.pdev = pdev;
>> - curr->vpci.mem = mem;
>> + curr->vpci.map_pending = true;
>> curr->vpci.cmd = cmd;
>> curr->vpci.rom_only = rom_only;
>> /*
>> @@ -234,42 +285,40 @@ static void defer_map(struct domain *d, struct pci_dev
>> *pdev,
>> static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool
>> rom_only)
>> {
>> struct vpci_header *header = &pdev->vpci->header;
>> - struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>> struct pci_dev *tmp, *dev = NULL;
>> const struct vpci_msix *msix = pdev->vpci->msix;
>> - unsigned int i;
>> + unsigned int i, j;
>> int rc;
>> -
>> - if ( !mem )
>> - return -ENOMEM;
>> + bool map_pending;
>>
>> /*
>> - * Create a rangeset that represents the current device BARs memory
>> region
>> + * Create a rangeset per BAR that represents the current device memory
>> region
>> * and compare it against all the currently active BAR memory regions.
>> If
>> * an overlap is found, subtract it from the region to be
>> mapped/unmapped.
>> *
>> - * First fill the rangeset with all the BARs of this device or with the
>> ROM
>> + * First fill the rangesets with all the BARs of this device or with
>> the ROM
> ^ 'all' doesn't apply anymore.
Will fix
>> * BAR only, depending on whether the guest is toggling the memory
>> decode
>> * bit of the command register, or the enable bit of the ROM BAR
>> register.
>> */
>> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> {
>> - const struct vpci_bar *bar = &header->bars[i];
>> + struct vpci_bar *bar = &header->bars[i];
>> unsigned long start = PFN_DOWN(bar->addr);
>> unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>
>> + ASSERT(bar->mem);
>> +
>> if ( !MAPPABLE_BAR(bar) ||
>> (rom_only ? bar->type != VPCI_BAR_ROM
>> : (bar->type == VPCI_BAR_ROM &&
>> !header->rom_enabled)) )
>> continue;
>>
>> - rc = rangeset_add_range(mem, start, end);
>> + rc = rangeset_add_range(bar->mem, start, end);
>> if ( rc )
>> {
>> printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
>> start, end, rc);
>> - rangeset_destroy(mem);
>> - return rc;
>> + goto fail;
>> }
> I think you also need to check that BARs from the same device don't
> overlap themselves. This wasn't needed before because all BARs shared
> the same rangeset. It's not uncommon for BARs of the same device to
> share a page.
>
> So you would need something like the following added to the loop:
>
> /* Check for overlap with the already setup BAR ranges. */
> for ( j = 0; j < i; j++ )
> rangeset_remove_range(header->bars[j].mem, start, end);
Good point
>> }
>>
>> @@ -280,14 +329,21 @@ static int modify_bars(const struct pci_dev *pdev,
>> uint16_t cmd, bool rom_only)
>> unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>> vmsix_table_size(pdev->vpci, i) - 1);
>>
>> - rc = rangeset_remove_range(mem, start, end);
>> - if ( rc )
>> + for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
>> {
>> - printk(XENLOG_G_WARNING
>> - "Failed to remove MSIX table [%lx, %lx]: %d\n",
>> - start, end, rc);
>> - rangeset_destroy(mem);
>> - return rc;
>> + const struct vpci_bar *bar = &header->bars[j];
>> +
>> + if ( rangeset_is_empty(bar->mem) )
>> + continue;
>> +
>> + rc = rangeset_remove_range(bar->mem, start, end);
>> + if ( rc )
>> + {
>> + printk(XENLOG_G_WARNING
>> + "Failed to remove MSIX table [%lx, %lx]: %d\n",
>> + start, end, rc);
>> + goto fail;
>> + }
>> }
>> }
>>
>> @@ -325,7 +381,8 @@ static int modify_bars(const struct pci_dev *pdev,
>> uint16_t cmd, bool rom_only)
>> unsigned long start = PFN_DOWN(bar->addr);
>> unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>
>> - if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end)
>> ||
>> + if ( !bar->enabled ||
>> + !rangeset_overlaps_range(bar->mem, start, end) ||
>> /*
>> * If only the ROM enable bit is toggled check against
>> other
>> * BARs in the same device for overlaps, but not against
>> the
>> @@ -334,14 +391,13 @@ static int modify_bars(const struct pci_dev *pdev,
>> uint16_t cmd, bool rom_only)
>> (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
>> continue;
>>
>> - rc = rangeset_remove_range(mem, start, end);
>> + rc = rangeset_remove_range(bar->mem, start, end);
>> if ( rc )
>> {
>> spin_unlock(&tmp->vpci_lock);
>> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]:
>> %d\n",
>> start, end, rc);
>> - rangeset_destroy(mem);
>> - return rc;
>> + goto fail;
>> }
>> }
>> spin_unlock(&tmp->vpci_lock);
>> @@ -360,12 +416,36 @@ static int modify_bars(const struct pci_dev *pdev,
>> uint16_t cmd, bool rom_only)
>> * will always be to establish mappings and process all the BARs.
>> */
>> ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
>> - return apply_map(pdev->domain, pdev, mem, cmd);
>> + return apply_map(pdev->domain, pdev, cmd);
>> }
>>
>> - defer_map(dev->domain, dev, mem, cmd, rom_only);
>> + /* Find out how many memory ranges has left after MSI and overlaps. */
>> + map_pending = false;
>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> + if ( !rangeset_is_empty(header->bars[i].mem) )
>> + {
>> + map_pending = true;
>> + break;
>> + }
>> +
>> + /*
>> + * There are cases when PCI device, root port for example, has neither
>> + * memory space nor IO. In this case PCI command register write is
>> + * missed resulting in the underlying PCI device not functional, so:
>> + * - if there are no regions write the command register now
>> + * - if there are regions then defer work and write later on
> I would just say:
>
> /* If there's no mapping work write the command register now. */
Ok
>> + */
>> + if ( !map_pending )
>> + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> + else
>> + defer_map(dev->domain, dev, cmd, rom_only);
>>
>> return 0;
>> +
>> +fail:
>> + /* Destroy all the ranges we may have added. */
>> + vpci_bar_remove_ranges(pdev);
>> + return rc;
>> }
>>
>> static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index a9e9e8ec438c..98b12a61be6f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -52,11 +52,16 @@ static void vpci_remove_device_handlers_locked(struct
>> pci_dev *pdev)
>>
>> void vpci_remove_device_locked(struct pci_dev *pdev)
>> {
>> + struct vpci_header *header = &pdev->vpci->header;
>> + unsigned int i;
>> +
>> ASSERT(spin_is_locked(&pdev->vpci_lock));
>>
>> pdev->vpci_cancel_pending = true;
>> vpci_remove_device_handlers_locked(pdev);
>> vpci_cancel_pending_locked(pdev);
>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> + rangeset_destroy(header->bars[i].mem);
>> xfree(pdev->vpci->msix);
>> xfree(pdev->vpci->msi);
>> xfree(pdev->vpci);
>> @@ -92,6 +97,8 @@ static int run_vpci_init(struct pci_dev *pdev)
>> int vpci_add_handlers(struct pci_dev *pdev)
>> {
>> struct vpci *vpci;
>> + struct vpci_header *header;
>> + unsigned int i;
>> int rc;
>>
>> if ( !has_vpci(pdev->domain) )
>> @@ -108,11 +115,32 @@ int vpci_add_handlers(struct pci_dev *pdev)
>> pdev->vpci = vpci;
>> INIT_LIST_HEAD(&pdev->vpci->handlers);
>>
>> + header = &pdev->vpci->header;
>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> + {
>> + struct vpci_bar *bar = &header->bars[i];
>> + char str[32];
>> +
>> + snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
>> + bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
>> + if ( !bar->mem )
>> + {
>> + rc = -ENOMEM;
>> + goto fail;
>> + }
>> + }
> You just need the ranges for the VPCI_BAR_MEM32, VPCI_BAR_MEM64_LO and
> VPCI_BAR_ROM BAR types (see the MAPPABLE_BAR macro). Would it be
> possible to only allocate the rangeset for those BAR types?
I guess so
> Also this should be done in init_bars rather than here, as you would
> know the BAR types.
So, if we allocate these in init_bars so where are they destroyed then?
I think this should be vpci_remove_device and from this POV it would
be good to keep alloc/free code close to each other, e.g.
vpci_add_handlers/vpci_remove_device in the same file
> Thanks, Roger.
Thank you,
Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |