RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)

He, Qing wrote:
> On Fri, 2009-10-16 at 16:35 +0800, Zhang, Xiantao wrote:
>> He, Qing wrote:
>>> On Fri, 2009-10-16 at 16:22 +0800, Zhang, Xiantao wrote:
>>>> He, Qing wrote:
>>>>> On Fri, 2009-10-16 at 15:32 +0800, Zhang, Xiantao wrote:
>>>>>> According to the description, the issue should be caused by lost
>>>>>> EOI write for the MSI interrupt and leads to permanent interrupt
>>>>>> mask. There should be a race between guest setting new vector and
>>>>>> EOIs old vector for the interrupt.  Once guest sets new vector
>>>>>> before it EOIs the old vector, hypervisor can't find the pirq
>>>>>> which corresponds old vector(has changed
>>>>>> to new vector) , so also can't EOI the old vector forever in
>>>>>> hardware level. Since the corresponding vector in real processor
>>>>>> can't be EOIed, so system may lose all interrupts and result the
>>>>>> reported issues ultimately.
>>>>>> But I remembered there should be a timer to handle this case
>>>>>> through a forcible EOI write to the real processor after timeout,
>>>>>> but seems it doesn't function in the expected way.
>>>>> The EOI timer is supposed to deal with the irq sharing problem,
>>>>> since MSI doesn't share, this timer will not be started in the
>>>>> case of MSI.
>>>> That maybe a problem if so. If a malicious/buggy guest won't EOI
>>>> the MSI vector, so host may hang due to lack of timeout mechanism?
>>> Why does host hang? Only the assigned interrupt will block, and
>>> that's exactly what the guest wants :-)
>> Hypervisor shouldn't EOI the real vector until guest EOI the
>> corresponding virtual vector , right ?  Not sure.:-)
> Yes, it is the algorithm used today.

So it should be still a problem. If guest won't do eoi, host can't do eoi also, 
and leads to system hang without timeout mechanism. So we may need to introduce 
a timer for each MSI interrupt source to avoid hanging host, Keir? 

> After reviewing the code, if the guest really does something like
> changing affinity within the window between an irq fire and eoi,
> there is indeed a problem, attached is the patch. Although I kinda
> doubt it, shouldn't desc->lock in guest protect and make these two
> operations mutual exclusive.

We shouldn't let hypervisor do real EOI before guest does the correponding 
virtual EOI, so this patch maybe have a correctness issue. :-)

Attached the fix according to my privious guess, and it should fix the issue. 


