[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 24 Nov 2020 13:21:02 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=S+wZjtA5qv85uNzbG2MpKsL2j7/fVq+8tr23ISSjgW0=; b=gDKjXQrygG5WC6+H8FNUyydzbGnNwY/VROWbF43cI62t1ESv4dRvT6i+WjW734c7dnEVwkwHQ9Tuw2lUBu3xNSJPwaw3z08Z3bpso92RekKn1zqAQruTcCfNQUFUX0LjMLZWgzmWqNiwt2kykb+9lP1DwlhghJVbma6/3+9zx2BNtfoZ8aNA+sJ8/qq3Fme+Mraa2Yp6LU8/1YTDOsdDTHFKf+Uu+Orv1ZVy8ldycB2nJN1pZNefc3qPBXsxqmKr5OWtzkAfiB6eQ8fegba3pHuXEHQSD+1wAmaZk78nRzH8oZtMiqMjzg45SkRET4TE2AWuZbDU/y6nPousDoLXYA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=akyBNoR98n2oyBfNGJUi0DlYUIxfEU1juvtwllcFDByr6tOe1yvwU59l/MdKQeONPUzGsgDa/nl66Bav9zMCzzhgJVvwHYF7ERqgqy3LQS4J2q2MDd0mlvBu8Iq06M6pBby7t3xriyWeCGmEtwCooo1zRhClwLpfBR6dqF1MjFM7VckRn9Nq+vBc25Ms++pgYLNanNpSxfQooIEASIqCPLmKD+fiTmYBO4v38j58oSInUrGrTTB1jBscWforhYs1USqhXr/tbVY5cVAgERs3NkQByy3t3qPvic6tqvY/VPa1qRbt282srDjOzHbfONMyBX5mlIuQZ+CbEY4VS08kPw==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 24 Nov 2020 12:21:22 +0000
  • Ironport-sdr: P26pmw1m8uQYa0VNLSfqkzvQGOMBoB6DXMkpVJv9oq51mLl90ZGR2rM0mRQMSh2YQipBiYvptj zlnKCV0OHetvgWv+orhEBMm+GsTREkzo+YdtjhEYqbG3GBD3tXJyriiLQ0FrZwv7VTaxCpupjj zONm+wmeBKwBaBSvbWY64OCpAxElBzcYC8XJiZJK58CHcaWPut08LlKxzU85Pz1vVzyXqo9W1s Hl6fJXzEFFmtMHCOxm3v/V61G8TaLnLd4LRKEjvn2fsllfIKmrGlghyq0eZjl/nx5OYtTo/nR4 1ic=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Nov 24, 2020 at 11:05:12AM +0100, Jan Beulich wrote:
> On 23.11.2020 18:39, Manuel Bouyer wrote:
> > On Mon, Nov 23, 2020 at 06:06:10PM +0100, Roger Pau Monné wrote:
> >> OK, I'm afraid this is likely too verbose and messes with the timings.
> >>
> >> I've been looking (again) into the code, and I found something weird
> >> that I think could be related to the issue you are seeing, but haven't
> >> managed to try to boot the NetBSD kernel provided in order to assert
> >> whether it solves the issue or not (or even whether I'm able to
> >> repro it). Would you mind giving the patch below a try?
> > 
> > With this, I get the same hang but XEN outputs don't wake up the interrupt
> > any more. The NetBSD counter shows only one interrupt for ioapic2 pin 2,
> > while I would have about 8 at the time of the hang.
> > 
> > So, now it looks like interrupts are blocked forever.
> 
> Which may be a good thing for debugging purposes, because now we have
> a way to investigate what is actually blocking the interrupt's
> delivery without having to worry about more output screwing the
> overall picture.
> 
> > At
> > http://www-soc.lip6.fr/~bouyer/xen-log5.txt
> > you'll find the output of the 'i' key.
> 
> (XEN)    IRQ:  34 vec:59 IO-APIC-level   status=010 aff:{0}/{0-7} in-flight=1 
> d0: 34(-MM)
> 
> (XEN)     IRQ 34 Vec 89:
> (XEN)       Apic 0x02, Pin  2: vec=59 delivery=LoPri dest=L status=1 
> polarity=1 irr=1 trig=L mask=0 dest_id:00000001
> 
> (XEN) ioapic 2 pin 2 gsi 34 vector 0x67
> (XEN)   delivery mode 0 dest mode 0 delivery status 0
> (XEN)   polarity 1 IRR 0 trig mode 1 mask 0 dest id 0
> 
> IOW from guest pov the interrupt is entirely idle (mask and irr clear),
> while Xen sees it as both in-flight and irr also already having become
> set again. I continue to suspect the EOI timer not doing its job. Yet
> as said before, for it to have to do anything in the first place the
> "guest" (really Dom0 here) would need to fail to EOI the IRQ within
> the timeout period. Which in turn, given your description of how you
> handle interrupts, cannot be excluded (i.e. the handling may simply
> take "slightly" too long).

I've tried to force some of those scenarios myself by modifying the
code, and didn't seem to be able to trigger the same scenario. I guess
the NetBSD case is slightly difficult to recreate.

> What we're missing is LAPIC information, since the masked status logged
> is unclear: (-MM) isn't fully matching up with "mask=0". But of course
> the former is just a software representation, while the latter is what
> the RTE holds. IOW for the interrupt to not get delivered, there needs
> to be this or a higher ISR bit set (considering we don't use the TPR),
> or (I think we can pretty much exclude this) we'd need to be running
> with IRQs off for extended periods of time.

Let's dump the physical lapic(s) IRR and ISR together with the
IO-APIC state. Can you please apply the following patch and use the
'i' key again? (please keep the previous patch applied)

Thanks, Roger.
---8<---
diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 60627fd6e6..c33d682b69 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1547,3 +1547,24 @@ void check_for_unexpected_msi(unsigned int vector)
 {
     BUG_ON(apic_isr_read(vector));
 }
+
+static DEFINE_SPINLOCK(dump_lock);
+void dump_lapic(void *unused)
+{
+    unsigned int i;
+    unsigned long flags;
+
+    spin_lock_irqsave(&dump_lock, flags);
+    printk("CPU %u APIC ID %u\n", smp_processor_id(), apic_read(APIC_ID));
+
+    printk("IRR ");
+    for ( i = APIC_ISR_NR - 1; i-- > 0; )
+        printk("%08x", apic_read(APIC_ISR + i*0x10));
+
+    printk("\nISR ");
+    for ( i = APIC_ISR_NR - 1; i-- > 0; )
+        printk("%08x", apic_read(APIC_IRR + i*0x10));
+    printk("\n");
+
+    spin_unlock_irqrestore(&dump_lock, flags);
+}
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e66fa99ec7..92edb3000a 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2470,6 +2470,7 @@ static const char * delivery_mode_2_str(
     }
 }
 
+void dump_lapic(void *unused);
 void dump_ioapic_irq_info(void)
 {
     struct irq_pin_list *entry;
@@ -2516,6 +2517,9 @@ void dump_ioapic_irq_info(void)
             entry = &irq_2_pin[entry->next];
         }
     }
+
+    dump_lapic(NULL);
+    smp_call_function(dump_lapic, NULL, true);
 }
 
 static unsigned int __initdata max_gsi_irqs;




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.