[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: Xen 4.1 rc1 test report



Zheng, Shaohui wrote on 2011-01-23:
>2. [VT-d]xen panic on function do_IRQ after many times NIC pass-throu (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706

I may need some help on this bug. Below are my findings.

According the call trace, just got the fault code point is at the last line of 
below code segment.
--------------------
__do_IRQ_guest(...)
        for ( i = 0; i < action->nr_guests; i++ )
        d = action->guest[i];
        pirq = domain_irq_to_pirq(d, irq);
===========
Fatal page fault while access ((d)->arch.irq_pirq[irq]), because 
(d)->arch.irq_pirq is already NULL.

More experiments shows that while doing the one before last 'xl create', 
pciback could not locate the device to be assigned:
---------------------
[ 4802.773665] pciback pci-26-0: 22 Couldn't locate PCI device 
(0000:05:00.0)!perhaps already in-use?
============

And while doing the following 'xl destroy', device model didn't response:
---------------------
libxl: error: libxl_device.c:477:libxl__wait_for_device_model Device Model not 
ready
libxl: error: libxl_pci.c:866:do_pci_remove Device Model didn't respond in time
============

In the immediate 'xl debug i' output, we can see the guest pirqs of the 
assigned device were not unbound from the host irq desc.
---------------------
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:a8 
type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 16(-S--),1: 
16(----),
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:ba 
type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 55(----),
============

The unbound guest domain info(which is already destroy while 'xl destroy') then 
induces the null address access while there comes a spurious interrupt for that 
device.

There are three points we may need to do: 
1. Figure out the root cause why the pciback could not locate the device.
I suspect the previous 'xl destroy' didn't return the device to pcistub 
successfully.

2. Figure out the root cause why the guest pirq was not force unbound.
Just found:
Some time because if ( !IS_PRIV_FOR(current->domain, d) ) hit, so returned with 
-EINVAL;
Sometime if ( !(desc->status & IRQ_GUEST) ) hit, so do not unbind.

3. Think about how we could prevent such cases from panic Xen.

Any ideas, hints, comments, suggestions or even fixes on it?

Jimmy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.