[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic



On 13/08/13 12:39, Wu, Feng wrote:

Hi Thimo,

 

I am trying to reproduce this issue on my side, unfortunately, I failed to boot up the guest rhel6.4 on top of Xen-4.1.5 RC1 with 3.9.3 domain0 kernel. Since Xen-4.1.5 is a little old, could you please share the guest configuration file you used when this issue happened? Thanks a lot!

 

Thanks,

Feng


Stepping in here for a moment, Thimo is running XenServer 6.2

This issue started on the XenServer forums but moved here.  For reference, we found this once in XenServer testing (as seen at the root of this email thread), but I have been unable to reproduce the issue since.  We have seen the crash on Xen 4.1 and 4.2

~Andrew

 

From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Thimo E
Sent: Monday, August 12, 2013 9:55 PM
To: Zhang, Yang Z
Cc: Keir Fraser; Jan Beulich; Andrew Cooper; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

 

Hello Yang,

and attached the next crash dump which occured today, only some minutes after I've created the logfiles I've sent in the mail just before.
Perhaps together with the logfiles of the former mail it gives you a better understand of what is going on.

I've disabled Interrupt remapping now.

> 4.....
> can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

@Andrew: Is it possible for you to integrate the requested changes from Yang into your Xen debugging version ?

Best regards
  Thimo

Am 12.08.2013 10:49, schrieb Zhang, Yang Z:

Hi Thimo,

From your previous experience and log, it shows:

1.      The interrupt that triggers the issue is a MSI.

2.      MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).

3.      The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.

4.      The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

5.      Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.

|I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?

Also, please provide the whole Xen log.

 

Best regards,

Yang

 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.