[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] x86/MSI-X: be more careful during teardown



>>> On 13.03.15 at 19:10, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 10/03/15 16:27, Jan Beulich wrote:
>> When a device gets detached from a guest, pciback will clear its
>> command register, thus disabling both memory and I/O decoding. The
>> disabled memory decoding, however, has an effect on the MSI-X table
>> accesses the hypervisor does: These won't have the intended effect
>> anymore. Even worse, for PCIe devices (but not SR-IOV virtual
>> functions) such accesses may (will?) be treated as Unsupported
>> Requests
> 
> Will, as the root port will issue a write and get no reply.

The questionable point here really is that of compatibility with
older bus architectures: On x86 (other than e.g. on ia64) the
CPU accessing _any_ physical memory would always be valid;
in the worst case reads would return all bits set and write would
be dropped. PCIe "violating" this, I could see platform designers
to implement workarounds resulting in the original behavior for
at least CPU side accesses.

> VF config space is part of a memory bar on the PF, which itself will
> still be valid for requests even if the VF has memory decoding disabled.

A spec compliant VF simply can't disable decoding, as the
respective command register bit is specified to be hardwires to
zero.

> On the other hand, if the PF has memory decoding disabled, I expect any
> VF config access to result in a UR.

Of course. But obviously passing through a PF to an untrusted
guest, allowing it to create VFs, and the passing through those
VFs onwards to other guests will render those other guests
insecure no matter what.

>> @@ -287,7 +314,8 @@ void set_msi_affinity(struct irq_desc *d
>>      ASSERT(spin_is_locked(&desc->lock));
>>  
>>      memset(&msg, 0, sizeof(msg));
>> -    read_msi_msg(msi_desc, &msg);
>> +    if ( !read_msi_msg(msi_desc, &msg) )
>> +        return;
>>  
>>      msg.data &= ~MSI_DATA_VECTOR_MASK;
>>      msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
> 
> Assuming a non-racy setup, this hunk makes the memory decode check in
> write_msi_msg() redundant.  I cant however think of a neat way to elide
> the second access, because all other callers of write_msi_msg() need it.

And indeed I had looked at where such redundancy could be
avoided. As you say, avoiding it here would make the code
event more complicated, which I don't think we want.

>> @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
>>          }
>>          break;
>>      case PCI_CAP_ID_MSIX:
>> -    {
>> -        int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
>> -        writel(flag, entry->mask_base + offset);
>> -        readl(entry->mask_base + offset);
>> -        break;
>> -    }
>> +        if ( likely(memory_decoded(pdev)) )
>> +        {
>> +            writel(flag, entry->mask_base + 
>> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>> +            readl(entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>> +            break;
>> +        }
>> +        if ( flag )
>> +        {
>> +            u16 control;
>> +            domid_t domid = pdev->domain->domain_id;
>> +
>> +            control = pci_conf_read16(seg, bus, slot, func,
>> +                                      
>> msix_control_reg(entry->msi_attrib.pos));
>> +            if ( control & PCI_MSIX_FLAGS_MASKALL )
>> +                break;
>> +            pci_conf_write16(seg, bus, slot, func,
>> +                             msix_control_reg(entry->msi_attrib.pos),
>> +                             control | PCI_MSIX_FLAGS_MASKALL);
>> +            if ( pdev->msix->warned != domid )
>> +            {
>> +                pdev->msix->warned = domid;
>> +                printk(XENLOG_G_WARNING
>> +                       "cannot mask IRQ %d: masked MSI-X on Dom%d's 
>> %04x:%02x:%02x.%u\n",
> 
> "masked all MSI-X" to make it slightly clearer that the global mask bit
> has been hit.

I don't think we need to make the message any longer / more
verbose. For MSI-X it's pretty clear that by not naming a particular
entry, it concerns all of them.

>> @@ -401,12 +463,14 @@ static int msi_get_mask_bit(const struct
>>  
>>  void mask_msi_irq(struct irq_desc *desc)
>>  {
>> -    msi_set_mask_bit(desc, 1);
>> +    if ( unlikely(!msi_set_mask_bit(desc, 1)) )
>> +        BUG();
> 
> Previously, BUG() was only on a codepath controlled by the
> msi_attrib.type, whereas now it includes failures based on failure to
> mask an issue.
> 
> Is this wise, as it can be guest-influenced?

Can it really be?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.