[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] x86/MSI-X: be more careful during teardown



>>> On 10.03.15 at 22:05, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 10/03/15 16:27, Jan Beulich wrote:
>> When a device gets detached from a guest, pciback will clear its
>> command register, thus disabling both memory and I/O decoding. The
>> disabled memory decoding, however, has an effect on the MSI-X table
>> accesses the hypervisor does: These won't have the intended effect
>> anymore. Even worse, for PCIe devices (but not SR-IOV virtual
>> functions) such accesses may (will?) be treated as Unsupported
>> Requests, causing respective errors to be surfaced, potentially in the
>> form of NMIs that may be fatal to the hypervisor or Dom0 is different
>> ways. Hence rather than carrying out these accesses, we should avoid
>> them where we can, and use alternative (e.g. PCI config space based)
>> mechanisms to achieve at least the same effect.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> Backporting note (largely to myself):
>>    Depends on (not yet backported) commit 061eebe0e "x86/MSI: drop
>>    workaround for insecure Dom0 kernels" (due to re-use of struct
>>    arch_msix's warned field).
>>
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -121,6 +121,27 @@ static void msix_put_fixmap(struct arch_
>>      spin_unlock(&msix->table_lock);
>>  }
>>  
>> +static bool_t memory_decoded(const struct pci_dev *dev)
>> +{
>> +    u8 bus, slot, func;
>> +
>> +    if ( !dev->info.is_virtfn )
>> +    {
>> +        bus = dev->bus;
>> +        slot = PCI_SLOT(dev->devfn);
>> +        func = PCI_FUNC(dev->devfn);
>> +    }
>> +    else
>> +    {
>> +        bus = dev->info.physfn.bus;
>> +        slot = PCI_SLOT(dev->info.physfn.devfn);
>> +        func = PCI_FUNC(dev->info.physfn.devfn);
>> +    }
>> +
>> +    return !!(pci_conf_read16(dev->seg, bus, slot, func, PCI_COMMAND) &
>> +              PCI_COMMAND_MEMORY);
>> +}
>> +
> 
> This check is racy against anyone who can write to the command register,
> which includes dom0 and other pcpus in Xen.  There does not appear to be
> any exclusion between Xen emulating a control register write on one cpu
> and changing irq affinities on another.

But - as in may other contexts - we assume Dom0 to not be hostile,
and hence to not disable memory decoding on a device out of the
blue. Xen itself certainly doesn't do so at runtime at all.

> As a result, this check does not actually protect against accessing the
> MSI-X bar while memory decoding is disabled.  As a downside, it puts an
> expensive config space access on moderately frequent codepaths.
> 
> One issue we have just identified pertains to dom0 resetting a device
> and Xen falling over a UR which has been escalated to fatal, most likely
> because an in-progress MSI-X interrupt migration.

Definitely not. There are no interrupts active on a device at that
point anymore (or else it would have been a bug for Dom0 to
disable memory decoding on it). The observed UR is due to Xen's
own (largely redundant) cleanup of the guest pIRQ upon cleaning
up after the guest's death.

>  There does not appear
> to be sufficient synchronisation available in the interface for a dom0
> to even cooperatively perform a device reset with Xen.
> 
> The more I consider this and related problems, the more I am thinking
> that the only longterm solution is to have a full PCI implementation in
> Xen, and to prevent any unmediated access, including from dom0.  Xen
> need not gain much (any?) more device-specific knowledge, but needs to
> gain the ability to properly mediate all config updates, and synchronise
> resets against other users of the device.  I do not suggest this
> lightly; I realise that it is a huge quantity of work.

Indeed. Plus it would likely break existing Dom0 (all of the sudden
needing to use hypercalls instead of direct access), unless we want
to get into the business of emulating MMCFG accesses.

Anyway - what does all of this mean for the presented patch series?
We've got a problem to fix, and as I'm realizing that the approach
taken may be controversial I'm certainly up for alternative
suggestions, but not such requiring an overhaul of a much larger
portion of code than just the MSI part. And I'd certainly be
hesitant to carry patches like the ones here in our own repos
without being upstream for any meaningful period of time.

As an aside - Linux doesn't do any synchronization or mediation of
(privileged) user space invoked config space accesses either. I.e.
from a Linux Dom0 pov the problem would likely extend into the
linux kernel (then needing a similar modification).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.