[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Fri, 11 Feb 2022 07:27:39 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RH2YM0lxfPWVymnBpZdpVYUeUf/4XqOhapA7DcU6Hvc=; b=VP7fnlf6hKJYogx2EUWHEiDLgr/sGXCzrCro4HO34Tkj3YtmGJvuFMOedBW0KB+A93b5X5BQZlpljSylorzigQma+5dahelv8a0F9YHaKw2qN6/q3hfrmpzPxEKEyNzso884/b2XiH+oVU+UAVPlhy+P809pDJtS1KZ+36U4MzAh1iDZoPR4slEeT7f8xi67Th0HDqonA+SIUjTlwWp+3nLYhJiu1SA96bmvskBvX/D1YnmUB1BhqLXav2zzrEH2cRcT+koKL7skY6YooG4hrBpnm73lRGbeqLhFiR1RumFQE3KQDembUAacH0GruIkzsU1hRmAyWPPQqtc6qYVnZA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=D74AKi+Ndk9NuworDKYwWYTCpMAjWBZ25ZG8rFJdncPI+JbY3tivRFrk9gLmuwfc9ThS3svFE0bYXkeJ41qTczpqw8IB+2cAsx7Y3bURpbzS1aMA4aDeq62RMIAjB85lXJFt1RHpiPoIBRDLPoILQayQa9swRP+DwTq+A+QBWNuePWwx9C4nPMITrcdUMu4e5dYjnOr5LkpN06FDszd27rAUWA+XcveoZdcp9gePj8BHNw2AdiaU5SyI5s8sB8k8a5xUeAvYZthuN+yVMHX8+Arz1/t67dsOsuA6dbKd2wIMcujRiNjj6cwbXtq40JZC9Du9AAPE3uM91d7G5YQinA==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Delivery-date: Fri, 11 Feb 2022 07:28:00 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYHboQT3cBWYI1/EGunE7hOwop6ayM95MAgAD+qQA=
  • Thread-topic: [PATCH] vpci: introduce per-domain lock to protect vpci structure

Hi, Roger!

On 10.02.22 18:16, Roger Pau Monné wrote:
> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>
>> Introduce a per-domain read/write lock to check whether vpci is present,
>> so we are sure there are no accesses to the contents of the vpci struct
>> if not. This lock can be used (and in a few cases is used right away)
>> so that vpci removal can be performed while holding the lock in write
>> mode. Previously such removal could race with vpci_read for example.
> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> pci_remove_device, and likely when vPCI gets also used in
> {de}assign_device I think.
Yes, this is indeed an issue, but I was not trying to solve it in
context of vPCI locking yet. I think we should discuss how do
we approach pdev locking, so I can create a patch for that.
that being said, I would like not to solve pdev in  this patch yet

...I do understand we do want to avoid that, but at the moment
a single reliable way for making sure pdev is alive seems to
be pcidevs_lock....
>
>> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
>> from being removed.
>>
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if done
>> under the read lock, requires vpci->lock to be acquired on both devices
>> being compared, which may produce a deadlock. It is not possible to
>> upgrade read lock to write lock in such a case. So, in order to prevent
>> the deadlock, check which registers are going to be written and acquire
>> the lock in the appropriate mode from the beginning.
>>
>> All other code, which doesn't lead to pdev->vpci destruction and does not
>> access multiple pdevs at the same time, can still use a combination of the
>> read lock and pdev->vpci->lock.
>>
>> 3. Optimize if ROM BAR write lock required detection by caching offset
>> of the ROM BAR register in vpci->header->rom_reg which depends on
>> header's type.
>>
>> 4. Reduce locked region in vpci_remove_device as it is now possible
>> to set pdev->vpci to NULL early right after the write lock is acquired.
>>
>> 5. Reduce locked region in vpci_add_handlers as it is possible to
>> initialize many more fields of the struct vpci before assigning it to
>> pdev->vpci.
>>
>> 6. vpci_{add|remove}_register are required to be called with the write lock
>> held, but it is not feasible to add an assert there as it requires
>> struct domain to be passed for that. So, add a comment about this requirement
>> to these and other functions with the equivalent constraints.
>>
>> 7. Drop const qualifier where the new rwlock is used and this is appropriate.
>>
>> 8. This is based on the discussion at [1].
>>
>> [1] 
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$
>>  [lore[.]kernel[.]org]
>>
>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>
>> ---
>> This was checked on x86: with and without PVH Dom0.
>> ---
>>   xen/arch/x86/hvm/vmsi.c   |   2 +
>>   xen/common/domain.c       |   3 +
>>   xen/drivers/vpci/header.c |   8 +++
>>   xen/drivers/vpci/msi.c    |   8 ++-
>>   xen/drivers/vpci/msix.c   |  40 +++++++++++--
>>   xen/drivers/vpci/vpci.c   | 114 ++++++++++++++++++++++++++++----------
>>   xen/include/xen/sched.h   |   3 +
>>   xen/include/xen/vpci.h    |   2 +
>>   8 files changed, 146 insertions(+), 34 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index 13e2a190b439..351cb968a423 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>   {
>>       unsigned int i;
>>   
>> +    ASSERT(!!rw_is_locked(&msix->pdev->domain->vpci_rwlock));
>                ^ no need for the double negation.
Ok, will update all asserts which use !!
>
> Also this asserts that the lock is taken, but could be by a different
> pCPU.  I guess it's better than nothing.
Fair enough. Do you still want the asserts or should I remove them?
>
>> +
>>       for ( i = 0; i < msix->max_entries; i++ )
>>       {
>>           const struct vpci_msix_entry *entry = &msix->entries[i];
> Since this function is now called with the per-domain rwlock read
> locked it's likely not appropriate to call process_pending_softirqs
> while holding such lock (check below).
You are right, as it is possible that:

process_pending_softirqs -> vpci_process_pending -> read_lock

Even more, vpci_process_pending may also

read_unlock -> vpci_remove_device -> write_lock

in its error path. So, any invocation of process_pending_softirqs
must not hold d->vpci_rwlock at least.

And also we need to check that pdev->vpci was not removed
in between or *re-created*
>
> We will likely need to re-iterate over the list of pdevs assigned to
> the domain and assert that the pdev is still assigned to the same
> domain.
So, do you mean a pattern like the below should be used at all
places where we need to call process_pending_softirqs?

read_unlock
process_pending_softirqs
read_lock
pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
<continue processing>

>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 2048ebad86ff..10558c22285d 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -616,6 +616,9 @@ struct domain *domain_create(domid_t domid,
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       INIT_LIST_HEAD(&d->pdev_list);
>> +#ifdef CONFIG_HAS_VPCI
>> +    rwlock_init(&d->vpci_rwlock);
>> +#endif
>>   #endif
>>   
>>       /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 40ff79c33f8f..9e2aeb2055c9 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -142,12 +142,14 @@ bool vpci_process_pending(struct vcpu *v)
>>           if ( rc == -ERESTART )
>>               return true;
>>   
>> +        read_lock(&v->domain->vpci_rwlock);
>>           spin_lock(&v->vpci.pdev->vpci->lock);
>>           /* Disable memory decoding unconditionally on failure. */
>>           modify_decoding(v->vpci.pdev,
>>                           rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : 
>> v->vpci.cmd,
>>                           !rc && v->vpci.rom_only);
>>           spin_unlock(&v->vpci.pdev->vpci->lock);
>> +        read_unlock(&v->domain->vpci_rwlock);
>>   
>>           rangeset_destroy(v->vpci.mem);
>>           v->vpci.mem = NULL;
>> @@ -203,6 +205,7 @@ static void defer_map(struct domain *d, struct pci_dev 
>> *pdev,
>>       raise_softirq(SCHEDULE_SOFTIRQ);
>>   }
>>   
>> +/* This must hold domain's vpci_rwlock in write mode. */
>>   static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool 
>> rom_only)
>>   {
>>       struct vpci_header *header = &pdev->vpci->header;
>> @@ -454,6 +457,8 @@ static int init_bars(struct pci_dev *pdev)
>>       struct vpci_bar *bars = header->bars;
>>       int rc;
>>   
>> +    ASSERT(!!rw_is_write_locked(&pdev->domain->vpci_rwlock));
>> +
>>       switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>       {
>>       case PCI_HEADER_TYPE_NORMAL:
>> @@ -548,6 +553,7 @@ static int init_bars(struct pci_dev *pdev)
>>       {
>>           struct vpci_bar *rom = &header->bars[num_bars];
>>   
>> +        header->rom_reg = rom_reg;
>>           rom->type = VPCI_BAR_ROM;
>>           rom->size = size;
>>           rom->addr = addr;
>> @@ -559,6 +565,8 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( rc )
>>               rom->type = VPCI_BAR_EMPTY;
> You can also set 'rom_reg = ~0' here. Or move the setting of rom_reg
> after the handler has been successfully installed.
>
> I think it would be easier to just signal no ROM BAR with rom_reg ==
> 0.  There's no header where the ROM BAR is at offset 0.  That way you
> will only have to set rom_reg on the successful path, but you don't
> need to care about the case where there's no ROM BAR.
Yes, it is possible
>
>>       }
>> +    else
>> +        header->rom_reg = ~(unsigned int)0;
> No need for the cast.
Ok
>
>>   
>>       return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
>>   }
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 5757a7aed20f..5df3dfa8243c 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -190,6 +190,8 @@ static int init_msi(struct pci_dev *pdev)
>>       uint16_t control;
>>       int ret;
>>   
>> +    ASSERT(!!rw_is_write_locked(&pdev->domain->vpci_rwlock));
>> +
>>       if ( !pos )
>>           return 0;
>>   
>> @@ -265,7 +267,7 @@ REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>>   
>>   void vpci_dump_msi(void)
>>   {
>> -    const struct domain *d;
>> +    struct domain *d;
>>   
>>       rcu_read_lock(&domlist_read_lock);
>>       for_each_domain ( d )
>> @@ -277,6 +279,9 @@ void vpci_dump_msi(void)
>>   
>>           printk("vPCI MSI/MSI-X d%d\n", d->domain_id);
>>   
>> +        if ( !read_trylock(&d->vpci_rwlock) )
>> +            continue;
>> +
>>           for_each_pdev ( d, pdev )
>>           {
>>               const struct vpci_msi *msi;
>> @@ -326,6 +331,7 @@ void vpci_dump_msi(void)
>>               spin_unlock(&pdev->vpci->lock);
>>               process_pending_softirqs();
>>           }
>> +        read_unlock(&d->vpci_rwlock);
> Same here, you are calling process_pending_softirqs while holding
> vpci_rwlock.
Same as above
>
>>       }
>>       rcu_read_unlock(&domlist_read_lock);
>>   }
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index 846f1b8d7038..5296d6025d8e 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -138,6 +138,7 @@ static void control_write(const struct pci_dev *pdev, 
>> unsigned int reg,
>>           pci_conf_write16(pdev->sbdf, reg, val);
>>   }
>>   
>> +/* This must hold domain's vpci_rwlock in write mode. */
>>   static struct vpci_msix *msix_find(const struct domain *d, unsigned long 
>> addr)
>>   {
>>       struct vpci_msix *msix;
>> @@ -158,7 +159,12 @@ static struct vpci_msix *msix_find(const struct domain 
>> *d, unsigned long addr)
>>   
>>   static int msix_accept(struct vcpu *v, unsigned long addr)
>>   {
>> -    return !!msix_find(v->domain, addr);
>> +    int rc;
>> +
>> +    read_lock(&v->domain->vpci_rwlock);
>> +    rc = !!msix_find(v->domain, addr);
>> +    read_unlock(&v->domain->vpci_rwlock);
> Newline before return.
Ok
>
>> +    return rc;
>>   }
>>   
>>   static bool access_allowed(const struct pci_dev *pdev, unsigned long addr,
>> index fb0947179b79..16bb3b832e6a 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -89,22 +104,28 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>   }
>>   #endif /* __XEN__ */
>>   
>> -static int vpci_register_cmp(const struct vpci_register *r1,
>> -                             const struct vpci_register *r2)
>> +static int vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size,
>> +                           unsigned int r2_offset, unsigned int r2_size)
>>   {
>>       /* Return 0 if registers overlap. */
>> -    if ( r1->offset < r2->offset + r2->size &&
>> -         r2->offset < r1->offset + r1->size )
>> +    if ( r1_offset < r2_offset + r2_size &&
>> +         r2_offset < r1_offset + r1_size )
>>           return 0;
>> -    if ( r1->offset < r2->offset )
>> +    if ( r1_offset < r2_offset )
>>           return -1;
>> -    if ( r1->offset > r2->offset )
>> +    if ( r1_offset > r2_offset )
>>           return 1;
>>   
>>       ASSERT_UNREACHABLE();
>>       return 0;
>>   }
>>   
>> +static int vpci_register_cmp(const struct vpci_register *r1,
>> +                             const struct vpci_register *r2)
>> +{
>> +    return vpci_offset_cmp(r1->offset, r1->size, r2->offset, r2->size);
>> +}
> Seeing how this gets used below I'm not sure it's very helpful to
> reuse vpci_register_cmp, see my comment below.
Yes, the only thing which is used is the first check for the overlap
>
>> +
>>   /* Dummy hooks, writes are ignored, reads return 1's */
>>   static uint32_t vpci_ignored_read(const struct pci_dev *pdev, unsigned int 
>> reg,
>>                                     void *data)
>> @@ -129,6 +150,7 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, 
>> unsigned int reg,
>>       return pci_conf_read32(pdev->sbdf, reg);
>>   }
>>   
>> +/* This must hold domain's vpci_rwlock in write mode. */
>>   int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>                         vpci_write_t *write_handler, unsigned int offset,
>>                         unsigned int size, void *data)
>> @@ -152,8 +174,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t 
>> *read_handler,
>>       r->offset = offset;
>>       r->private = data;
>>   
>> -    spin_lock(&vpci->lock);
>> -
>>       /* The list of handlers must be kept sorted at all times. */
>>       list_for_each ( prev, &vpci->handlers )
>>       {
>> @@ -165,25 +185,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t 
>> *read_handler,
>>               break;
>>           if ( cmp == 0 )
>>           {
>> -            spin_unlock(&vpci->lock);
>>               xfree(r);
>>               return -EEXIST;
>>           }
>>       }
>>   
>>       list_add_tail(&r->node, prev);
>> -    spin_unlock(&vpci->lock);
>>   
>>       return 0;
>>   }
>>   
>> +/* This must hold domain's vpci_rwlock in write mode. */
>>   int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>>                            unsigned int size)
>>   {
>>       const struct vpci_register r = { .offset = offset, .size = size };
>>       struct vpci_register *rm;
>>   
>> -    spin_lock(&vpci->lock);
>>       list_for_each_entry ( rm, &vpci->handlers, node )
>>       {
>>           int cmp = vpci_register_cmp(&r, rm);
>> @@ -195,14 +213,12 @@ int vpci_remove_register(struct vpci *vpci, unsigned 
>> int offset,
>>           if ( !cmp && rm->offset == offset && rm->size == size )
>>           {
>>               list_del(&rm->node);
>> -            spin_unlock(&vpci->lock);
>>               xfree(rm);
>>               return 0;
>>           }
>>           if ( cmp <= 0 )
>>               break;
>>       }
>> -    spin_unlock(&vpci->lock);
>>   
>>       return -ENOENT;
>>   }
>> @@ -310,7 +326,7 @@ static uint32_t merge_result(uint32_t data, uint32_t 
>> new, unsigned int size,
>>   
>>   uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>   {
>> -    const struct domain *d = current->domain;
>> +    struct domain *d = current->domain;
>>       const struct pci_dev *pdev;
>>       const struct vpci_register *r;
>>       unsigned int data_offset = 0;
>> @@ -327,6 +343,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size)
>>       if ( !pdev )
>>           return vpci_read_hw(sbdf, reg, size);
>>   
>> +    read_lock(&d->vpci_rwlock);
> After taking the domain lock you need to check that pdev->vpci !=
> NULL, as vpci_remove_device will set pdev->vpci == NULL before
> removing the device from the domain. Same applies to vpci_add_handlers
> which will be called with the device already added to the domain, but
> with pdev->vpci == NULL.
Sure, will add
>
> We would also need some kind of protection arround
> pci_get_pdev_by_domain so that devices are not removed (from the
> domain) while we are iterating over it.
Again, this is a problem. And we need to find a way to solve that
>
>>       spin_lock(&pdev->vpci->lock);
>>   
>>       /* Read from the hardware or the emulated register handlers. */
>> @@ -371,6 +388,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size)
>>           ASSERT(data_offset < size);
>>       }
>>       spin_unlock(&pdev->vpci->lock);
>> +    read_unlock(&d->vpci_rwlock);
>>   
>>       if ( data_offset < size )
>>       {
>> @@ -410,14 +428,37 @@ static void vpci_write_helper(const struct pci_dev 
>> *pdev,
>>                r->private);
>>   }
>>   
>> +static bool vpci_header_write_lock(const struct pci_dev *pdev,
>> +                                   unsigned int start, unsigned int size)
> I think this should live in header.c, for consistency.
Ok, will move
>
> I'm also not sure it's worth adding vpci_offset_cmp: you just need to
> do a range overlap check, and that can be easily open coded. It's just
> the first 'if' in vpci_register_cmp that you want, the rest of the
> code is just adding overhead.
Agree
>
>> +{
>> +    /*
>> +     * Writing the command register and ROM BAR register may trigger
>> +     * modify_bars to run which in turn may access multiple pdevs while
>> +     * checking for the existing BAR's overlap. The overlapping check, if 
>> done
>> +     * under the read lock, requires vpci->lock to be acquired on both 
>> devices
>> +     * being compared, which may produce a deadlock. It is not possible to
>> +     * upgrade read lock to write lock in such a case. So, in order to 
>> prevent
>> +     * the deadlock, check which registers are going to be written and 
>> acquire
>> +     * the lock in the appropriate mode from the beginning.
>> +     */
>> +    if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
>> +        return true;
>> +
>> +    if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
> No need for the comparison if rom_reg is unset. Also you can OR both
> conditions into a single if.
If we open code vpci_offset_cmp with a single if all this is going
to be a bit clumsy:

     if ( r1_offset < r2_offset + r2_size &&
          r2_offset < r1_offset + r1_size )
         return 0;
This is a single check.
Now we need to check two registers with the code above and
also check that pdev->vpci->header.rom_reg != 0

I think it would be more readable if we have a tiny helper function

static bool vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size,
                            unsigned int r2_offset, unsigned int r2_size)
{
     /* Return 0 if registers overlap. */
     if ( r1_offset < r2_offset + r2_size &&
          r2_offset < r1_offset + r1_size )
         return false;
     return true;
}

So, then we can have something like

static bool vpci_header_write_lock(const struct pci_dev *pdev,
                                    unsigned int start, unsigned int size)
{
     if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) ||
          (pdev->vpci->header.rom_reg &&
           !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4)) )
         return true;

     return false;
}

>
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>>   void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>                   uint32_t data)
>>   {
>> -    const struct domain *d = current->domain;
>> +    struct domain *d = current->domain;
>>       const struct pci_dev *pdev;
>>       const struct vpci_register *r;
>>       unsigned int data_offset = 0;
>>       const unsigned long *ro_map = pci_get_ro_map(sbdf.seg);
>> +    bool write_locked = false;
>>   
>>       if ( !size )
>>       {
>> @@ -440,7 +481,17 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size,
>>           return;
>>       }
>>   
>> -    spin_lock(&pdev->vpci->lock);
>> +    if ( vpci_header_write_lock(pdev, reg, size) )
>> +    {
>> +        /* Gain exclusive access to all of the domain pdevs vpci. */
>> +        write_lock(&d->vpci_rwlock);
>> +        write_locked = true;
> Here you need to check that pdev->vpci != NULL...
Will do
>
>> +    }
>> +    else
>> +    {
>> +        read_lock(&d->vpci_rwlock);
> ...and also here.
Will do
>
>> +        spin_lock(&pdev->vpci->lock);
>> +    }
>>   
>>       /* Write the value to the hardware or emulated registers. */
>>       list_for_each_entry ( r, &pdev->vpci->handlers, node )
>> @@ -475,7 +526,14 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size,
>>               break;
>>           ASSERT(data_offset < size);
>>       }
>> -    spin_unlock(&pdev->vpci->lock);
>> +
>> +    if ( write_locked )
>> +        write_unlock(&d->vpci_rwlock);
>> +    else
>> +    {
>> +        spin_unlock(&pdev->vpci->lock);
>> +        read_unlock(&d->vpci_rwlock);
>> +    }
>>   
>>       if ( data_offset < size )
>>           /* Tailing gap, write the remaining. */
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 37f78cc4c4c9..ecd34481a7af 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -444,6 +444,9 @@ struct domain
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       struct list_head pdev_list;
>> +#ifdef CONFIG_HAS_VPCI
>> +    rwlock_t vpci_rwlock;
>> +#endif
>>   #endif
>>   
>>   #ifdef CONFIG_HAS_PASSTHROUGH
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index e8ac1eb39513..e19e462ee5cb 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -88,6 +88,8 @@ struct vpci {
>>            * is mapped into guest p2m) if there's a ROM BAR on the device.
>>            */
>>           bool rom_enabled      : 1;
>> +        /* Offset to the ROM BAR register if any. */
>> +        unsigned int rom_reg;
> Could you please place this field after 'type'? That will avoid some
> padding in the structure.
Sure
>
> Thanks, Roger.
Thank you,
Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.