[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 118078: regressions - FAIL



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: 17 January 2018 08:52
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>; osstest-
> admin@xxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] [xen-unstable test] 118078: regressions - FAIL
> 
> >>> On 16.01.18 at 18:30, <Paul.Durrant@xxxxxxxxxx> wrote:
> >> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx] On
> Behalf
> >> Of Paul Durrant
> >> Sent: 16 January 2018 09:27
> >> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> > Sent: 16 January 2018 08:58
> >> > >>> On 16.01.18 at 09:43, <osstest-admin@xxxxxxxxxxxxxx> wrote:
> >> > > flight 118078 xen-unstable real [real]
> >> > > http://logs.test-lab.xenproject.org/osstest/logs/118078/
> >> > >
> >> > > Regressions :-(
> >> > >
> >> > > Tests which did not succeed and are blocking,
> >> > > including tests which could not be run:
> >> > >  build-arm64-pvops             6 kernel-build             fail REGR. 
> >> > > vs. 118003
> >> > >  test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install  fail
> REGR.
> >> vs.
> >> > 118003
> >> >
> >> > Paul,
> >> >
> >> > is this last one something you could look into?
> >> >
> >> > (XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0
> >> > build: 1db0
> >> > (XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff
> >> > (XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe
> >> > (XEN) domain_crash called from viridian.c:452
> >> > (XEN) Domain 4 (vcpu#0) crashed on cpu#1:
> >> > (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> >> > (XEN) CPU:    1
> >> > (XEN) RIP:    0010:[<fffff8000265d479>]
> >> > (XEN) RFLAGS: 0000000000000286   CONTEXT: hvm guest (d4v0)
> >> > (XEN) rax: 0000000000000000   rbx: fffff800027f7e80   rcx:
> 0000000000000001
> >> > (XEN) rdx: 0000000000000000   rsi: fffffa800129d040   rdi:
> fffff80002805c40
> >> > (XEN) rbp: 0000000000000080   rsp: fffff880009b0d80   r8:
> >> 0000000000000000
> >> > (XEN) r9:  fffff800027f7e80   r10: fffffa800129d040   r11: 
> >> > fffff800027f7e90
> >> > (XEN) r12: fffff800008129a0   r13: fffff800028b9be0   r14:
> fffffa8001239b30
> >> > (XEN) r15: fffff80000b96080   cr0: 0000000080050031   cr4:
> >> 00000000000006b8
> >> > (XEN) cr3: 0000000000187000   cr2: 0000000000000000
> >> > (XEN) fsb: 0000000000000000   gsb: fffff800027f7d00   gss:
> fffff800027f7d00
> >> > (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
> >> >
> >> > I.e. the domain_crash() in viridian_start_apic_assist().
> >> >
> >>
> >> Yes, I'll have a look at that.
> >
> > No real clue about this as yet. It is odd that the guest has only set up one
> > of the APIC assist pages and yet has taken an interrupt...
> >
> > Jan 16 01:46:05.691223 (XEN) Dumping guest's current state at
> key_handler...
> > Jan 16 01:46:05.691265 (XEN) Size of VMCB = 4096, paddr =
> 000000020f7f7000,
> > vaddr = ffff83020f7f7000
> > Jan 16 01:46:05.699269 (XEN) cr_intercepts = 0xfef3fef3 dr_intercepts =
> > 0xffffffff exception_intercepts = 0x60082
> > Jan 16 01:46:05.707128 (XEN) general1_intercepts = 0xbdc4000f
> > general2_intercepts = 0x2e7f
> > Jan 16 01:46:05.715222 (XEN) iopm_base_pa = 0xdfd71000 msrpm_base_pa
> =
> > 0x20f7f4000 tsc_offset = 0xfffffc36684278c9
> > Jan 16 01:46:05.723116 (XEN) tlb_control = 0 vintr = 0x1020001
> > interrupt_shadow = 0
> > Jan 16 01:46:05.723153 (XEN) eventinj 000000008000002f, valid? 1, ec? 0,
> > type 0, vector 0x2f
> > Jan 16 01:46:05.731141 (XEN) exitcode = 0x64 exitintinfo = 0
> > Jan 16 01:46:05.739123 (XEN) exitinfo1 = 0 exitinfo2 = 0
> > Jan 16 01:46:05.739157 (XEN) np_enable = 0x1 guest_asid = 0x4b49
> > Jan 16 01:46:05.739187 (XEN) virtual vmload/vmsave = 0, virt_ext = 0
> >
> > I'd expect it to have interrupts disabled at this point. Seemingly doesn't
> > repro on Intel h/w (although I was testing with Win7 SP1 rather than RTM)
> so
> > I'll try to find some AMD h/w and try again.
> 
> Well, it looks to be a random problem in the first place, or else we
> would have known about the issue much earlier I think. I.e. I'm
> not sure I see what you take the "AMD only" from here.
> 

Well, I don't see any repro on Intel and the VMCB (rather than VMCS) dump 
identifies this case as using AMD h/w. I agree that it is random though, so it 
could indeed be purely coincidental.

> As to interrupt state - isn't it quite normal for an OS to bring up
> the BSP first, enable interrupts on it, and then bring up APs?
> That's how we do it in Xen.

What I meant was that I'd expect the guest to have interrupts disabled whilst 
poking the MSR to enable APIC assist on that CPU, since enabling APIC assist is 
clearly going to modify the way in which interrupts are handled. If that's not 
the case though then I guess that is probably the cause of the issue; I never 
really considered protecting interrupt handling against APIC assist being 
enabled on the same CPU.

  Paul

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.