WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: PV on HVM guest hang...

To: Sheng Liang <shengliang@xxxxxxxxx>
Subject: [Xen-devel] Re: PV on HVM guest hang...
From: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx>
Date: Thu, 25 Sep 2008 12:10:30 -0700
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 25 Sep 2008 12:11:53 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <6ee487650809251106v5618d1cdhfe775bb1ccf7303c@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Oracle Corp
References: <48B4D097.7040507@xxxxxxxxxx> <6ee487650809251106v5618d1cdhfe775bb1ccf7303c@xxxxxxxxxxxxxx>
Reply-to: mukesh.rathor@xxxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.5 (X11/20070719)
I was able to finally track this down. Basically, on source machine, if there's
an event for the guest at the right moment during live migration, the line
is asserted via the pci_intx.i bit in:

__hvm_pci_intx_assert():

    if ( __test_and_set_bit(device*4 + intx, &hvm_irq->pci_intx.i) ) <-----
            return;

when moved to target, this gets carried over, and gsi is asserted again:

irq_load_pci():

            if ( test_bit(dev*4 + intx, &hvm_irq->pci_intx.i) )
            {
                /* Direct GSI assert */
                gsi = hvm_pci_intx_gsi(dev, intx);
                hvm_irq->gsi_assert_count[gsi]++;   <---
                /* PCI-ISA bridge assert */
                link = hvm_pci_intx_link(dev, intx);
                hvm_irq->pci_link_assert_count[link]++;
            }

As soon as it gets a xen_platform_pci event, the assert count causes it
to be delivered in a loop, hence the guest hang.

My simple fix is to just check for mask:

vioapic_masked():
.....
+    gsi = hvm_pci_intx_gsi(device, intx);
+    if (vioapic_masked(d, gsi))
+       return;
+

vioapic.c:
+int vioapic_masked(struct domain *d, unsigned int irq)
+{
+    struct hvm_hw_vioapic *vioapic = domain_vioapic(d);
+    union vioapic_redir_entry *ent;
+
+    ent = &vioapic->redirtbl[irq];
+    if ( ent->fields.mask )
+        return 1;
+
+    return 0;
+}
+

This seems to work, but not sure if it's the best fix, and currently waiting
for feedback from intel, and others here now.

Thanks
mukesh


Sheng Liang wrote:
> Mukesh,
>
> Did you ever get a response to this? Were you able to track it down?
>
> Sheng
>
> On Tue, Aug 26, 2008 at 8:57 PM, Mukesh Rathor <mukesh.rathor@xxxxxxxxxx
> <mailto:mukesh.rathor@xxxxxxxxxx>> wrote:
>
>     I'm debugging a hang of 64bit HVM guest with PV drivers. The problem
>     happens during migrate. So far I've discovered that the guest is
>     stuck in loop receiving interrupt 0xa9/169. In the hypervisor I see
>     that upon vmx exit, it sends 0xa9 right away...
>
>     (XEN)    [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40   :0xa9
>     (XEN)    [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150
>     (XEN)    [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440
>     (XEN)    [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0
>     (XEN)    [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0
>     (XEN)    [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360
>     (XEN)    [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0
>     (XEN)    [<ffff828c8014e568>] mmio_operands+0xa8/0x160
>     (XEN)    [<ffff828c8014eb96>] handle_mmio+0x576/0x880
>     (XEN)    [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900
>
>
>     I'm now trying ot figure out the IP that causes vm exit so I can
>     figure where in the guest/guest-driver its writing to the APIC.
>     On the guest side, I see that evtchn_pending_sel is not set in
>     evtchn_interrupt().
>
>     Any ideas/suggestions would be great as it is a critical bug.
>
>     Thanks
>     Mukesh
>
>     _______________________________________________
>     Xen-devel mailing list
>     Xen-devel@xxxxxxxxxxxxxxxxxxx <mailto:Xen-devel@xxxxxxxxxxxxxxxxxxx>
>     http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>