[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] iommu: make no-quarantine mean no-quarantine



On 29.04.2021 23:04, Scott Davis wrote:
> On 4/28/21, 3:20 AM, Paul Durrant wrote:
>>> Following the extension to the command line option I'm putting in place
>>> in "IOMMU: make DMA containment of quarantined devices optional" (which
>>> I still need to get around to address review feedback for and resubmit),
>>> I'd be inclined to suggest "iommu=quarantine=always" or
>>> "iommu=quarantine=on-assign". Unless of course we'd prefer to have the
>>> caller of the assignment operation have full control over the behavior
>>> here anyway (in which case a command line option control simply is not
>>> necessary).
>>
>> I'm still not entirely sure why not quarantining on is a problem, other
>> than it triggering an as-yet undiagnosed issue in QEMU, but I agree that
>> that the expectation of 'no-quarantine' meaning just that (i.e. the old
>> dom0->domU and domU->dom0 transitions are re-instated) is reasonable. Do
>> we really want yet more command line options?
> 
> Regarding the problem in QEMU, I traced the crash trigger down to a
> write to the IQ tail register during the mapping operation into dom_io
> (backtrace below). Along the way I noticed that, since a non-present
> entry was being flushed, flush_context_qi only performs this
> invalidation on an IOMMU with caching mode enabled (i.e. a software
> IOMMU). Therefore this issue is probably only hittable when nesting.
> Disabling caching mode on the QEMU vIOMMU was enough to prevent the
> crash and give me a working system.
> 
> (gdb) si
> 0xffff82d04025b68b  72  in qinval.c
>    0xffff82d04025b687 <qinval_update_qtail+43>: ... shl    $0x4,%r12
> => 0xffff82d04025b68b <qinval_update_qtail+47>: ... mov    %r12,0x88(%rax)
> (gdb) bt
> #0  0xffff82d04025b68b in qinval_update_qtail (...) at qinval.c:72
> #1  0xffff82d04025baa7 in queue_invalidate_context_sync (...) at qinval.c:101
> #2  flush_context_qi (...) at qinval.c:341
> #3  0xffff82d040259125 in iommu_flush_context_device (...) at iommu.c:400
> #4  domain_context_mapping_one (...) at iommu.c:1436
> #5  0xffff82d040259351 in domain_context_mapping (...) at iommu.c:1510
> #6  0xffff82d040259d20 in reassign_device_ownership (...) at iommu.c:2412
> #7  0xffff82d040259f19 in intel_iommu_assign_device (...) at iommu.c:2476
> #8  0xffff82d040267154 in assign_device (...) at pci.c:1545
> #9  iommu_do_pci_domctl (...) at pci.c:1732
> #10 0xffff82d040264de3 in iommu_do_domctl (...) at iommu.c:539
> #11 0xffff82d040322ca5 in arch_do_domctl (...) at domctl.c:1496
> #12 0xffff82d040205a19 in do_domctl (...) at domctl.c:956
> #13 0xffff82d040319476 in pv_hypercall (...) at hypercall.c:155
> #14 0xffff82d040390432 in lstar_enter () at entry.S:271
> #15 0x0000000000000000 in ?? ()

Interesting. This then leaves the question whether we submit a bogus
command, or whether qemu can't deal (correctly) with a valid one here.
So far you didn't tell us what the actual crash was. I guess it's not
even clear to me whether it's Xen or qemu that did crash for you. But
I have to also admit that until now it wasn't really clear to me that
you ran Xen _under_ qemu - instead I was assuming there was an
interaction problem with a qemu serving a guest.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.