xen-devel
Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET bro
On 21.09.2010 13:56, Pasi Kärkkäinen wrote:
I am talking a while (via email) with Jan now to track the following
problem and he suggested that I report the problem on xen-devel:
Jul 9 01:48:04 virt kernel: aacraid: Host adapter reset request. SCSI
hang ?
Jul 9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
Jul 9 01:49:10 virt kernel: Calling adapter init
Jul 9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not
guaranteed on shared IRQs
Jul 9 01:49:49 virt kernel: Acquiring adapter information
Jul 9 01:49:49 virt kernel: update_interval=30:00 check_interval=86400s
Jul 9 01:53:13 virt kernel: aacraid: aac_fib_send: first asynchronous
command timed out.
Jul 9 01:53:13 virt kernel: Usually a result of a PCI interrupt routing
problem;
Jul 9 01:53:13 virt kernel: update mother board BIOS or consider
utilizing one of
Jul 9 01:53:13 virt kernel: the SAFE mode kernel options (acpi, apic etc)
After the VMs have been running a while the aacraid driver reports a
non-responding RAID controller. Most of the time the NIC is also no
longer working.
I nearly tried every combination of dom0 kernel (pvops0, xenfied suse
2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen
hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
No success in two month. Every combination earlier or later had the
problem shown above. I did extensive tests to make sure that the
hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
Jan suggested to try the fix in c/s 22051 but it did not help. My answer
to him:
In the meantime I did try xen-unstable c/s 22068 (contains staging c/s
22051) and
it did not fix the problem at all. I was able to fix a problem with
the serial console
and so I got some debug info that is attached to this email. The
following line looks
suspicious to me (irr=1, delivery_status=1):
(XEN) IRQ 16 Vec216:
(XEN) Apic 0x00, Pin 16: vector=216, delivery_mode=1,
dest_mode=logical,
delivery_status=1, polarity=1, irr=1, trigger=level,
mask=0, dest_id:1
IRQ 16 is the aacraid controller which after some while seems to be
enable to receive
interrupts. Can you see from the debug info what is going on?
I also applied a small patch which disables HPET broadcast. The machine
is now running
for 110 hours without a crash while normally it crashes within a few
minutes. Is there
something wrong (race, deadlock) with HPET broadcasts in relation to
blocked interrupt
reception (see above)?
What kind of hardware does this happen on?
It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
Should this patch be merged?
Not easy to answer. I spend more than 10 weeks searching nearly full
time for the reason of the stability issues. Finally I was able to track
it down to the HPET broadcast code.
We need to find the developer of the HPET broadcast code. Then, he
should try to fix the code. I consider it a quite severe bug as it
renders Xen nearly useless on affected systems. That is why I (and my
boss who pays me) spend so much time (developing/fixing Xen is not
really my core job) and money (buying a E5620 machine just for testing Xen).
I think many people on affected systems are having problems. See
http://lists.xensource.com/archives/html/xen-users/2010-09/msg00370.html
Regards Andreas
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Andreas Kinzler
- Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Pasi Kärkkäinen
- Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast,
Andreas Kinzler <=
- Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Jeremy Fitzhardinge
- Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Andreas Kinzler
- Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Jeremy Fitzhardinge
- RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Zhang, Xiantao
- RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Wei, Gang
- Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Andreas Kinzler
- RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast, Zhang, Xiantao
|
|
|