[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] pci: a workaround for nonstandard PCI devices whose PBA shares



On Thu, Apr 05, 2018 at 10:34:39AM +0100, Roger Pau Monné wrote:
>On Wed, Apr 04, 2018 at 11:29:39PM +0800, Chao Gao wrote:
>> ... the same page with other registers which are not relevant to MSI-X. Xen
>> marks pages where PBA resides as read-only. When assigning such devices to
>> guest, device driver writes MSI-X irrelevant registers on those pages would
>> lead to an EPT violation and the guest is destroyed because no handler is
>> registered for those address range. In order to make guest capable to use 
>> such
>> kind of devices, trapping very frequent write accesses is not a good idea for
>> it would significantly impact the performance.
>> 
>> This patch provides a workaround with caveat. Specifically, an option is
>> introduced to specify a list of devices. For those devices, Xen doesn't
>> control the access right to pages where PBA resides. Hence, guest device
>> driver is able to write those pages and functions well. Note that adding an
>> untrusted device to this option may endanger security of the entire system.
>> 
>> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
>> ---
>>  docs/misc/xen-command-line.markdown | 10 +++++++++
>>  xen/arch/x86/msi.c                  |  7 ++++--
>>  xen/drivers/passthrough/pci.c       | 45 
>> +++++++++++++++++++++++++++++++++++--
>>  xen/include/asm-x86/msi.h           |  1 +
>>  4 files changed, 59 insertions(+), 4 deletions(-)
>> 
>> diff --git a/docs/misc/xen-command-line.markdown 
>> b/docs/misc/xen-command-line.markdown
>> index b353352..e382513 100644
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1423,6 +1423,16 @@ Defaults to booting secondary processors.
>>  
>>  > Default: `on`
>>  
>> +### pba\_quirk
>
>pba_write_allowed would be better, pba_quirk is too generic IMO.
>
>> +> `= List of [<seg>:]<bus>:<device>.<function>
>> +
>> +Specify a list of SBDF of devices. When assigning devices in this list to
>> +guest, reading or writing the page where MSI-X PBA resides are allowed.
>> +This option provides a workaround for nonstandard PCI devices whose
>> +MSI-X PBA shares the same 4K-byte page with other registers. Note that
>> +adding an untrusted device to this option would undermine security of
>> +the entire system.
>> +
>>  ### pci
>>  > `= {no-}serr | {no-}perr`
>>  
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 5567990..2abf2cf 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -992,7 +992,9 @@ static int msix_capability_init(struct pci_dev *dev,
>>          if ( rangeset_add_range(mmio_ro_ranges, msix->table.first,
>>                                  msix->table.last) )
>>              WARN();
>> -        if ( rangeset_add_range(mmio_ro_ranges, msix->pba.first,
>> +
>> +        if ( !msix->pba_quirk_enabled &&
>> +             rangeset_add_range(mmio_ro_ranges, msix->pba.first,
>>                                  msix->pba.last) )
>>              WARN();
>
>This will work fine as long as the PBA is not in the same page as the
>MSI-X table. In such case you will also need changes to QEMU (see
>pci_msix_write), so that writes to the memory in the same page as the
>MSI-X/PBA tables are forwarded to the underlying hardware.
>
>You should add least add something like:
>
>if ( msix->pba_quirk_enabled &&
>     msix->table.first <= msix->pba.last &&
>     msix->pba.first <= msix->table.last )
>{
>    printk("PBA write not allowed to dev %04x:%02x:%02x.%u due to MSI-X table 
> overlap\n");
>    return -ENXIO;
>}
>
>Or similar if the QEMU side is not fixed.
>
>Note that in order to fix the QEMU side you would probably have to add
>a flag to xl 'pci' config option and pass it to both QEMU and Xen.

Thanks for your comments.

First of all, I don't intend to also support devices which has MSI-X
table, MSI-X PBA and other MSI-X irrelevant registers in the same page.
Because as you said, it cleary violates MSI-X spec. IMO, we can extend
the workaround when we found such a device.

I also had the same concern with yours. But after careful thinking, I
found it wouldn't be a problem. If MSI-X table resides the same pages
with MSI-X PBA, we will mark the pages as RO when handling MSI-X table.
As a consequence, guest isn't able to write MSI-X table directly in this
case. Hence, it won't affect MSI-X table emulation and furthermore the
quirk won't override the decision, marking the page RO, made for other
reasons.
>
>>  
>> @@ -1139,7 +1141,8 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>>          if ( rangeset_remove_range(mmio_ro_ranges, msix->table.first,
>>                                     msix->table.last) )
>>              WARN();
>> -        if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
>> +        if ( !msix->pba_quirk_enabled &&
>> +             rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
>>                                     msix->pba.last) )
>>              WARN();
>>      }
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 1db69d5..cd765ef 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -184,6 +184,38 @@ static int __init parse_phantom_dev(const char *str)
>>  }
>>  custom_param("pci-phantom", parse_phantom_dev);
>>  
>> +static struct pba_quirk_dev {
>> +    uint32_t sbdf;
>> +} pba_quirk_devs[8];
>
>We have a sbdf type now, see 514f58.
>
>Also, I would prefer that you use a list here. I know it's not likely
>to have a huge number of devices in the system that require this
>quirk, but I also see no reason to limit this to 8 (or any other
>arbitrary value).

Ok. I will use a list and the new sbdf type.

>
>> +static unsigned int nr_pba_quirk_devs;
>> +
>> +static int __init parse_pba_quirk(const char *str)
>> +{
>> +    unsigned int seg, bus, dev, func;
>> +
>> +    for ( ; ; )
>> +    {
>> +        if ( nr_pba_quirk_devs >= ARRAY_SIZE(pba_quirk_devs) )
>> +            return -E2BIG;
>> +
>> +        str = parse_pci(str, &seg, &bus, &dev, &func);
>> +        if ( !str )
>> +            return -EINVAL;
>> +
>> +        pba_quirk_devs[nr_pba_quirk_devs++].sbdf = PCI_SBDF(seg, bus, dev,
>> +                                                            func);
>> +        if ( *str == ',' )
>> +            str++;
>> +        else if ( !*str )
>> +            break;
>> +        else
>> +            return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +custom_param("pba_quirk", parse_pba_quirk);
>> +
>>  static u16 __read_mostly command_mask;
>>  static u16 __read_mostly bridge_ctl_mask;
>>  
>> @@ -300,6 +332,7 @@ static void check_pdev(const struct pci_dev *pdev)
>>  
>>  static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>>  {
>> +    unsigned int i;
>>      struct pci_dev *pdev;
>>  
>>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>> @@ -328,6 +361,16 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, 
>> u8 bus, u8 devfn)
>>          }
>>          spin_lock_init(&msix->table_lock);
>>          pdev->msix = msix;
>> +
>> +        for ( i = 0; i < nr_pba_quirk_devs; i++ )
>> +        {
>> +            if ( pba_quirk_devs[i].sbdf == PCI_SBDF3(pseg->nr, bus, devfn) )
>> +            {
>> +                pdev->msix->pba_quirk_enabled = true;
>> +                printk(XENLOG_WARNING "Enable PBA quirk for 
>> %04x:%02x:%02x.%u\n",
>> +                       pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>> +            }
>
>You could do this as:
>
>if ( pba_quirk_devs[i].sbdf != PCI_SBDF3(pseg->nr, bus, devfn) )
>    continue;
>
>pdev->msix->pba_quirk_enabled = true;
>printk(XENLOG_WARNING "Enable PBA quirk for %04x:%02x:%02x.%u\n",
>       pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>break;
>
>Which removes one level of indentation (and also exits the loop as
>soon as a match is found).

Very good suggestion.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.