[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 20 Nov 2020 09:28:55 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IvalSxWtSQRZMaNFJ/AY+WptABqmVJaN1WINBaBf2F0=; b=ZqdvSPfj2qpXNsY17YdEyp0Ogh/pfDrKm1zlRJ3CQcgApGJ3NEMsB97zwue+l3lYDyLjJw02HS5hY6+6fAgR85v2wEV0ApD1wYLFWzAU69Yqp8v0lS6wxeRrT6aJQ8OAmrc24B72SJkLG/Zk7uLQSfHRlbuplfEA8w4IpLX7tt4/F9AlATxnCtaqPiniW7Tk/JS09cEYRQ7zxKd+tN5xvv0XNkVjtTiEJgqWqpnA7W6sVdRGhvBurNtTCrZu8LGNP2LDQFy5LEo0Hrg7eKTLxTQPXV6BWfgwNxIqOZWXh6PWNMZ+pbZg+TdAwrljlxkyzeTfX48oXwGVWR8QnOynMA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=czVRnoQ0dW9Z+4wTICjf8d0Db3kDxJ7QGH6pC+fNREjnP6W7fyJ2bwtbo8LUX+MY93KwLz/toRIW0vq1M0pfc3QRh1VnmizR4qRFwb+eoGSW/G8IIRnks+gc49qDbKrmaC4ItiuaN3BhN7WOzlAeLYiki/jfoPXvGwiNVKg4te2PfslDSSCvun/ohODlVitVpgJvdulTRTR2ELiZuLV+lj0xSMZKqX5mnRbhdap/JkpW/OpGZ4H1HhJY4IOvAVnH64//rkoHkTUE9nx/XHpiiYhKhKAu42c0fipyGiiaR5wnC44v2Xd9Q8edf4SJ+NhmeHRyAsNCV/NoFcwC47/2eg==
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 20 Nov 2020 08:29:31 +0000
  • Ironport-sdr: M8T2BEx/CbFnLXagUoBnqAmqLkvlvum0KFxQvndoHExxaCSIjIaOsKwi7Xdwj0Jjrn40HILdy0 IQx6CrzM+vQzcvKoETflzJiXHtGwOzagiwsmBeIGrikYa8MEj5MfMBuCzodgiw1HPzXdeP/KmH CgsPUOa4lgA/9kdNzj7CocJxJcgH5Va1yB7npP788KZ2L9yAd8LWj02iKAESxV/QPgE0BDxEbN j9D00bbaFgBRNk+etQPBjhAFrpn0978vgA9W/Xo8E6sp6BO8Sxbdd7Z/8SwO7WonpNlORCEZry 2gU=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Nov 20, 2020 at 09:09:51AM +0100, Jan Beulich wrote:
> On 19.11.2020 18:57, Manuel Bouyer wrote:
> > I added an ASSERT() after the printf to ket a stack trace, and got:
> > db{0}> call ioapic_dump_raw^M
> > Register dump of ioapic0^M
> > [  13.0193374] 00 08000000 00170011 08000000(XEN) vioapic.c:141:d0v0 
> > apic_mem_readl:undefined ioregsel 3
> > (XEN) vioapic.c:512:vioapic_irq_positive_edge: vioapic_deliver 2
> > (XEN) Assertion '!print' failed at vioapic.c:512
> > (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y   Tainted:   C   ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82d0402c4164>] 
> > vioapic_irq_positive_edge+0x14e/0x150
> > (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v0)
> > (XEN) rax: ffff82d0405c806c   rbx: ffff830836650580   rcx: 0000000000000000
> > (XEN) rdx: ffff8300688bffff   rsi: 000000000000000a   rdi: ffff82d0404b36b8
> > (XEN) rbp: ffff8300688bfde0   rsp: ffff8300688bfdc0   r8:  0000000000000004
> > (XEN) r9:  0000000000000032   r10: 0000000000000000   r11: 00000000fffffffd
> > (XEN) r12: ffff8308366dc000   r13: 0000000000000022   r14: ffff8308366dc31c
> > (XEN) r15: ffff8308366d1d80   cr0: 0000000080050033   cr4: 00000000003526e0
> > (XEN) cr3: 00000008366c9000   cr2: 0000000000000000
> > (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen code around <ffff82d0402c4164> 
> > (vioapic_irq_positive_edge+0x14e/0x150):
> > (XEN)  3d 10 be 1d 00 00 74 c2 <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 
> > 53 48
> > (XEN) Xen stack trace from rsp=ffff8300688bfdc0:
> > (XEN)    0000000200000086 ffff8308366dc000 0000000000000022 0000000000000000
> > (XEN)    ffff8300688bfe08 ffff82d0402bcc33 ffff8308366dc000 0000000000000022
> > (XEN)    0000000000000001 ffff8300688bfe40 ffff82d0402bd18f ffff830835a7eb98
> > (XEN)    ffff8308366dc000 ffff830835a7eb40 ffff8300688bfe68 0100100100100100
> > (XEN)    ffff8300688bfea0 ffff82d04026f6e1 ffff830835a7eb30 ffff8308366dc0f4
> > (XEN)    ffff830835a7eb40 ffff8300688bfe68 ffff8300688bfe68 ffff82d0405cec80
> > (XEN)    ffffffffffffffff ffff82d0405cec80 0000000000000000 ffff82d0405d6c80
> > (XEN)    ffff8300688bfed8 ffff82d04022b6fa ffff83083663f000 ffff83083663f000
> > (XEN)    0000000000000000 0000000000000000 0000000a7c62165b ffff8300688bfee8
> > (XEN)    ffff82d04022b798 ffff8300688bfe08 ffff82d0402a4bcb 0000000000000000
> > (XEN)    0000000000000206 ffff8316da86e61c ffff8316da86e600 ffff938031fd47c0
> > (XEN)    0000000000000003 0000000000000400 ff889e8da08f928a 0000000000000000
> > (XEN)    0000000000000002 0000000000000100 000000000000b86e ffff93803237f010
> > (XEN)    0000000000000000 ffff8316da86e61c 0000beef0000beef ffffffff80555918
> > (XEN)    000000bf0000beef 0000000000000046 ffff938031fd4790 000000000000beef
> > (XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
> > (XEN)    0000e01000000000 ffff83083663f000 0000000000000000 00000000003526e0
> > (XEN)    0000000000000000 0000000000000000 0000060100000001 0000000000000000
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d0402c4164>] R vioapic_irq_positive_edge+0x14e/0x150
> > (XEN)    [<ffff82d0402bcc33>] F arch/x86/hvm/irq.c#assert_gsi+0x5e/0x7b
> > (XEN)    [<ffff82d0402bd18f>] F hvm_gsi_assert+0x62/0x77
> > (XEN)    [<ffff82d04026f6e1>] F 
> > drivers/passthrough/io.c#dpci_softirq+0x261/0x29e
> > (XEN)    [<ffff82d04022b6fa>] F common/softirq.c#__do_softirq+0x8a/0xbf
> > (XEN)    [<ffff82d04022b798>] F do_softirq+0x13/0x15
> > (XEN)    [<ffff82d0402a4bcb>] F vmx_asm_do_vmentry+0x2b/0x30
> > (XEN) 
> > (XEN) 
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion '!print' failed at vioapic.c:512
> > (XEN) ****************************************
> 
> Right, this was the expected path after what you've sent prior to this.
> Which turned my attention back to the 'i' debug key output you had sent
> the other day. There we have
> 
> (XEN)    IRQ:  34 vec:51 IO-APIC-level   status=010 aff:{0}/{0-7} in-flight=1 
> d0: 34(-MM)
> 
> i.e. at that point we're waiting for Dom0 to signal it's done handling
> the IRQ. There is, however, a timer associated with this. Yet that's
> actually to prevent the system getting stuck, i.e. the "in-flight"
> state ought to clear 1ms later (when that timer expires), and hence
> ought to be pretty unlikely to catch when non-zero _and_ something's
> actually stuck.

I somehow assumed the interrupt was in-flight because the printing to
the Xen console caused one to be injected, and thus dom0 didn't had
time to Ack it yet.

> 
> So for the softirq to get Dom0 out of its stuck state, there has got to
> be yet some other event. Nevertheless it may be worthwhile
> instrumenting irq_guest_eoi_timer_fn() to prove we actually take this
> path, i.e. Xen is trying to "clean up" after Dom0 taking too long to
> service an IRQ. In normal operation this path shouldn't be taken, so I
> wouldn't exclude something got broken in that logic. (Orthogonal to
> this it may also be worth seeing whether increasing the timeout would
> actually help things. This wouldn't be a solution, but another data
> point hinting something's wrong on this code path.)
> 
> Roger, I'm also somewhat puzzled by the trailing (-MM): Is PVH using
> event channels for delivering pIRQ-s?

No, it's always using emulated interrupt controllers. I explicitly
disabled HVM PIRQ for PVH.

> I thought that's purely vIO-APIC
> and vMSI? I wonder whether we misleadingly dump info from evtchn 0
> here, in which case only the 2nd of the M-s would be meaningful (and
> would be in line with non-zero in-flight).

Likely - will have to look closer but there's no event channel
associated with a PIRQ on PVH dom0. I will send a patch to fix dump_irqs.

Maybe we should track interrupt EOI, and see when the interrupt gets
EOI'ed. Will see if I can find some time later to prepare another
debug patch.

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.