[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition



On 19/02/2019 07:43, Jan Beulich wrote:
>>>> On 18.02.19 at 19:30, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 18/02/2019 16:21, Igor Druzhinin wrote:
>>> It's unsafe to disable IOMMU on a live system which is the case
>>> if we're crashing since remapping hardware doesn't usually know what
>>> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
>>> etc. (depends on the firmware configuration) to signal these abnormalities.
>>> This, in turn, doesn't play well with kexec transition process as there is
>>> no any handling available at the moment for this kind of events resulting
>>> in failures to enter the kernel.
>>>
>>> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
>>> following kexec from the previous kernel (Xen in our case) - so it's
>>> currently normal to keep IOMMU enabled. It would only require to change
>>> crash kernel command line by enabling IOMMU drivers from the existing users.
> 
> Is this the normal option ("intel_iommu=on" in the Intel case), or
> rather something special? Considering that you explicitly talk about
> Linux here anyway, mentioning the option(s) explicitly would seem
> to make sense.
> 

Yes, it's intel_iommu / amd_iommu (which is enabled by default now).
I'll give an example in the commit message.

>>> An option is left for compatibility with ancient crash kernels which
>>> didn't like to have IOMMU active under their feet on boot.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>
>> To provide a bit of extra background, it turns out that in hindsight,
>> turning off the IOMMU in a crash usually makes things worse rather than
>> better.
> 
> For an unknown definition of "usually". Corrupted (IOMMU) page
> tables are not really an impossible crash reason.
> 

Well, that still falls into the definition of "usually". Having IOMMU
pages corrupted given proper isolation of devices sounds unlikely.

>> In particular, any guest with a PCI device which happens to allocate a
>> DMA buffer in GFN space which matches the crash region in MFN space will
>> end up corrupting the crash kernel when DMA remapping gets turned off.
> 
> Indeed, but that's only PVH Dom0 (unsupported as of yet) or PV
> Dom0 using PV IOMMU functionality (not even in tree as of yet). So
> the question is how applicable this change really is at this point in
> time. I notice it hasn't been tagged for 4.12, so please don't take
> this as objection to it going in - I'm only trying to better understand
> the implications.
> 

I think the more prevalent case that I wanted to cover here is the one
described in the commit message: assuming you have ongoing DMAs anywhere
in the system with IOMMU enabled hardware will raise an event (that
we're unable to handle) as soon as you turn IOMMU translation off. This
happens so commonly now that we cannot simply tolerate it happening
sporadically.

And I have no strong feelings if it doesn't get into 4.12.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.