Re: [Xen-devel] IOMMU Interrupt Remapping query

On 06/06/11 16:21, Keir Fraser wrote:
> On 06/06/2011 15:32, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote:
>> I am attempting to fix the kexec interactions with x2apic and iommu
>> functionality.  Part of this involves ensuring that all IOMMU
>> functionality is disabled, as the kdump kernels are not happy at having
>> their interrupts remapped without their knowledge.
>> I have introduced iommu_disable_x2apic_IR() onto the kexec path, but it
>> does not seem to actually disable interrupt remapping on Intel boxes
>> (Specifically the two Intel Nehalem boxes I am testing on).
>> Specifying iommu=no-intremap on the commandline causes everything to
>> work correctly, but leaving it out causes the kdump kernel to hang and
>> eventually reboot, as can be seen on the attached serial log.
>> The lines starting DBG: are extra debugging I have put in which shows
>> that the disable_IR() function is being called and writing to the registers.
> Should have attached your patch as well. Noone else can know with certainty
> where you put your debugging, and noone else is going to want to help debug
> your code if they can't even see it. :-)
> Also a good idea to Cc a likely person who can help (i.e., someone who wrote
> the code that you are querying). 'hg annotate' is useful for this -- in this
> case I am adding Weidong Han to the cc list.
> On the bright side, this must have been got working for S3 suspend/resume to
> work properly (indeed that's what the disable code was originally added
> for). So it can't be an insurmountable problem.
>  -- Keir
>> This problem occurs with the XenServer version of 4.1.0 as well as on
>> xen-unstable at the moment.
>> Is there any hardware state which is not taken down by the disable
>> function, any subtle interactions which I have not taken account of?  I
>> have looked through the source and nothing pops out, but I am out of ideas.
>> Thanks in advance,
Attached are the two relevant patches, and two which I don't think are
relevant but might be if I am wrong.  crash_shutdown was an attempt to
make an iommu_ops which shut down all iommu functionality without saving
state.  debug-wip shows where I have put in debug statements.

kdump-fix-x2apic and apic-record-boot-mode are also in the source, but I
believe them to be unrelated to the current problem.

I have done some further debugging on the assumption that the order of
shutting down interupt remapping matters with shutting down the lapics
and ioapics, but disable_qinval causes a panic (qinval.c:222 - "queue
invalidate wait descriptor was not executed\n") if it is run before both
the lapics and ioapics are shut down.

Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

