RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)

   If the device doesn't support MSI mask bit, the second patch should have no 
effect for that. And I am working on backporting more IRQ migration logic from 
Linux, and it should ensure addr/vector are both written to devices before 
firing new interrrupts.   But as I mentioned before, if you want to solve the 
guest affinity setting issue,  you have to apply the first patch I sent out 
(fix-irq-affinity-msi3.patch). :-)

Cinco, Dante wrote:
> Xiantao,
> I'm sorry I forgot to mention that I did apply your two patches but
> it didn't have any effect (interrupts still lost after changing
> smp_affinity and "No handler for irq vector" message). I added a
> dprintk in msi_set_mask_bit() and realized that MSI does not have a
> mask bit (MSIX does). My PCI device uses MSI not MSIX. I placed my
> dprintk inside the condition below and it never triggered.     
>     switch (entry->msi_attrib.type) {
>     case PCI_CAP_ID_MSI:
>         if (entry->msi_attrib.maskbit) {
> While debugging this problem, I thought about the potential problem
> of an interrupt firing between the writes for the MSI message address
> and MSI message data. I noticed that pci_conf_write() uses
> spin_lock_irqsave() to disable interrupts before issuing the "out"
> instruction but the writes for the address and data are two separate
> pci_conf_write() calls. To me, it would be safer to write the address
> and data in a single call and preceded by spin_lock_irqsave(). This
> way, when the interrupts are enabled, the address and data have both
> been updated.        
> Dante
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> Sent: Thursday, October 22, 2009 2:42 AM
> To: Zhang, Xiantao; Jan Beulich
> Cc: He, Qing; xen-devel@xxxxxxxxxxxxxxxxxxx; Cinco, Dante
> Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus
> > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem) 
> On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote:
>>> Hmm, then I don't understand which case your patch was a fix for: I
>>> understood that it addresses an issue when the affinity of an
>>> interrupt gets changed (requiring a re-write of the address/data
>>> pair). If the hypervisor can deal with it without masking, then why
>>> did you add it?
>> Hmm, sorry, seems I misunderstood your question. If the msi doesn't
>> support mask bit(clearing MSI enable bit doesn't help in this case),
>> the issue may still exist. Just checked Linux side, seems it doesn't
>> perform mask operation when program MSI, but don't know why Linux
>> hasn't such issues.  Actaully, we do see inconsisten interrupt
>> message 
>> from the device without this patch, and after applying the patch, the
>> issue is gone.  May need further investigation why Linux doesn't
>> need the mask operation. 
> Linux is quite careful about when it will reprogram vector/affinity
> info isn't it? Doesn't it mark such an update pending and only flush
> it through during next interrupt delivery, or something like that? Do
> we need some of the upstream Linux patches for this?   
>  -- Keir

