[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls


  • To: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 18 Nov 2020 15:39:28 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fKgcvU4jHaNyhyd0ZRWsY1Jcj+fW3JM7a14qDrDkDRs=; b=WsC86ZgwlbnyupiQvJ4ZDegfJPBpQY0wozTl6FOvfFdnYRI6+TckdSddnJUnaWQugvacUxkKaOmIkX87vTlXgq0Udqn6PY5FBzkE93hR1mqMAPXMyVxhVx6KWeFblSFCx7Xa6aqZc8wRJIKMeMK38cR/b+5y3zo4PiyAfjbQQ/K5rON5VAUkRIumAU3DHMuClq8YXPc7nK2/JvpOK4h2Agfv/97Vogg+U2atWk11lKHlcd60SN/u2mrw1KftjUmzGB7ErcU2//1YBOmngwcljUGfI85LlY4BG62HTVZkIE/P2LHy3jMyGeagaJVJqXK1GGLKr4Oyg0+u8gzWZ0p8Vw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oN/fAuJk4pwCc9/AH/2CmUy8IuwkNjcYTk7iUczi3yG2q+NxzwY0Jhk1YYfhZMLvoL+f48qXd2WJocVL7h5PbAQIP3v7eRTOCuvE7KHYXaB9cP6oyFDQnwMTz/r9/Vw1CgftAZARhbqJ95O/JtokkwKMsBKFc3hzA1By8TmK7vUiZFaZCvbXjE71I+81aR1BZ3qDSrZ3zezlNzE3Yj7vvmcgeP4RuqqZFWR0PS7SlLoMpTjM5KjYVtUt7so0lCr3vEf6/Iw/Y5EgzAdvqFdmQfDItPVu6fxglLKJaNq+8bVb0QIEoOLVjpzaqZIicITBgpuIC5w4AOQV6ECA0Y9Wjw==
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 18 Nov 2020 14:39:49 +0000
  • Ironport-sdr: CtzrP1/SJIy5vSGrdJAm2PP4+dbtT3duhD5qGlDJUlgwERwIT0GDgMNfiN8qCeYkUm9PNxOSEV SBrK7At5tW0uVxww1zAx1zPKBXBSZnJhEIBPXD8x7b3fGqHdyeyDwGJHlF0YCRDTLbnmWDXTGY kgK7TbvK9jt0rDhjdGl6A8PgtGNHtfNBLK1MwmaufYQd0MIAFOWDx5L1iwr6U745o39Bo6uuJA WiQJrax7vZ6/IUNfPPsQxxV1Vk6NuDDIEegPNClyoyAQQKBlmCTFD7QBghF2BHpx11LOnQddXf HLo=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Nov 18, 2020 at 01:14:03PM +0100, Manuel Bouyer wrote:
> On Wed, Nov 18, 2020 at 11:00:25AM +0100, Roger Pau Monné wrote:
> > On Wed, Nov 18, 2020 at 10:24:25AM +0100, Manuel Bouyer wrote:
> > > On Wed, Nov 18, 2020 at 09:57:38AM +0100, Roger Pau Monné wrote:
> > > > On Tue, Nov 17, 2020 at 05:40:33PM +0100, Manuel Bouyer wrote:
> > > > > On Tue, Nov 17, 2020 at 04:58:07PM +0100, Roger Pau Monné wrote:
> > > > > > [...]
> > > > > > 
> > > > > > I have attached a patch below that will dump the vIO-APIC info as 
> > > > > > part
> > > > > > of the 'i' debug key output, can you paste the whole output of the 
> > > > > > 'i'
> > > > > > debug key when the system stalls?
> > > > > 
> > > > > see attached file. Note that the kernel did unstall while 'i' output 
> > > > > was
> > > > > being printed, so it is mixed with some NetBSD kernel output.
> > > > > The idt entry of the 'ioapic2 pin2' interrupt is 103 on CPU 0.
> > > > > 
> > > > > I also put the whole sequence at
> > > > > http://www-soc.lip6.fr/~bouyer/xen-log3.txt
> > > > 
> > > > On one of the instances the pin shows up as masked, but I'm not sure
> > > > if that's relevant since later it shows up as unmasked. Might just be
> > > > part of how NetBSD handles such interrupts.
> > > 
> > > Yes, NetBSD can mask an interrupt source if the interrupts needs to be 
> > > delayed.
> > > It will be unmasked once the interrupt has been handled.
> > 
> > Yes, I think that's roughly the same model that FreeBSD uses for
> > level IO-APIC interrupts: mask it until the handlers have been run.
> > 
> > > Would it be possible that Xen misses an unmask write, or fails to
> > > call the vector if the interrupt is again pending at the time of the
> > > unmask ?
> > 
> > Well, it should work properly, but we cannot discard anything.
> 
> I did some more instrumentation from the NetBSD kernel, including dumping
> the iopic2 pin2 register.
> 
> At the time of the command timeout, the register value is 0x0000a067,
> which, if I understant it properly, menas that there's no interrupt
> pending (bit IOAPIC_REDLO_RIRR, 0x00004000, is not set).
> From the NetBSD ddb, I can dump this register multiple times, waiting
> several seconds, etc .., it doens't change).
> Now if I call ioapic_dump_raw() from the debugger, which triggers some
> XEN printf:
> db{0}> call ioapic_dump_raw^M
> Register dump of ioapic0^M
> [ 203.5489060] 00 08000000 00170011 08000000(XEN) vioapic.c:124:d0v0 
> apic_mem_re
> adl:undefined ioregsel 3
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7
>  00000000^M
> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 8
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f
>  00000000^M
> [ 203.5489060] 10 00010000 00000000 00010000 00000000 00010000 00000000 
> 00010000 00000000^M
> [...]
> [ 203.5489060] Register dump of ioapic2^M
> [ 203.5489060] 00 0a000000 00070011 0a000000(XEN) vioapic.c:124:d0v0 
> apic_mem_readl:undefined ioregsel 3
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7
>  00000000^M
> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 8
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e
>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f
>  00000000^M
> [ 203.5489060] 10 00010000 00000000 00010000 00000000 0000e067 00000000 
> 00010000 00000000^M
> 
> then the register switches to 0000e067, with the IOAPIC_REDLO_RIRR bit set.
> From here, if I continue from ddb, the dom0 boots.
> 
> I can get the same effect by just doing ^A^A^A so my guess is that it's
> not accessing the iopic's register which changes the IOAPIC_REDLO_RIRR bit,
> but the XEN printf. Also, from NetBSD, using a dump fuinction which
> doesn't access undefined registers - and so doesn't trigger XEN printfs -
> doens't change the IOAPIC_REDLO_RIRR bit either.

I'm thinking about further ways to debug this. I see that all active
IO-APIC pins are routed to vCPU0, but does it make a difference if you
boot with dom0_max_vcpus=1 on the Xen command line? (thus limiting
NertBSD dom0 to a single CPU)

I can also prepare a patch that will periodically dump the same stuff
as the 'i' debug key without you having to press anything, but I'm not
sure if it would help much.

Also, does the system work fine when you reach multiuser, or it also
randomly freezes and requires further poking?

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.