Xen project Mailing List

Re: [Xen-devel] [xen-unstable test] 118078: regressions - FAIL

To: 'Jan Beulich' <JBeulich@xxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Wed, 17 Jan 2018 09:37:00 +0000

Accept-language: en-GB, en-US

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "osstest-admin@xxxxxxxxxxxxxx" <osstest-admin@xxxxxxxxxxxxxx>

Delivery-date: Wed, 17 Jan 2018 09:37:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHTjqYnZTjYH1kXs0usU3DUj6GlL6N2IhuAgAAYlbCAAIYiAIAA8eeAgAAa//A=

Thread-topic: [Xen-devel] [xen-unstable test] 118078: regressions - FAIL

> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: 17 January 2018 08:52 > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>; osstest- > admin@xxxxxxxxxxxxxx > Subject: RE: [Xen-devel] [xen-unstable test] 118078: regressions - FAIL > > >>> On 16.01.18 at 18:30, <Paul.Durrant@xxxxxxxxxx> wrote: > >> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx] On > Behalf > >> Of Paul Durrant > >> Sent: 16 January 2018 09:27 > >> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> > Sent: 16 January 2018 08:58 > >> > >>> On 16.01.18 at 09:43, <osstest-admin@xxxxxxxxxxxxxx> wrote: > >> > > flight 118078 xen-unstable real [real] > >> > > http://logs.test-lab.xenproject.org/osstest/logs/118078/ > >> > > > >> > > Regressions :-( > >> > > > >> > > Tests which did not succeed and are blocking, > >> > > including tests which could not be run: > >> > > build-arm64-pvops 6 kernel-build fail REGR. > >> > > vs. 118003 > >> > > test-amd64-amd64-xl-qemut-win7-amd64 10 windows-install fail > REGR. > >> vs. > >> > 118003 > >> > > >> > Paul, > >> > > >> > is this last one something you could look into? > >> > > >> > (XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0 > >> > build: 1db0 > >> > (XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff > >> > (XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe > >> > (XEN) domain_crash called from viridian.c:452 > >> > (XEN) Domain 4 (vcpu#0) crashed on cpu#1: > >> > (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- > >> > (XEN) CPU: 1 > >> > (XEN) RIP: 0010:[<fffff8000265d479>] > >> > (XEN) RFLAGS: 0000000000000286 CONTEXT: hvm guest (d4v0) > >> > (XEN) rax: 0000000000000000 rbx: fffff800027f7e80 rcx: > 0000000000000001 > >> > (XEN) rdx: 0000000000000000 rsi: fffffa800129d040 rdi: > fffff80002805c40 > >> > (XEN) rbp: 0000000000000080 rsp: fffff880009b0d80 r8: > >> 0000000000000000 > >> > (XEN) r9: fffff800027f7e80 r10: fffffa800129d040 r11: > >> > fffff800027f7e90 > >> > (XEN) r12: fffff800008129a0 r13: fffff800028b9be0 r14: > fffffa8001239b30 > >> > (XEN) r15: fffff80000b96080 cr0: 0000000080050031 cr4: > >> 00000000000006b8 > >> > (XEN) cr3: 0000000000187000 cr2: 0000000000000000 > >> > (XEN) fsb: 0000000000000000 gsb: fffff800027f7d00 gss: > fffff800027f7d00 > >> > (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 > >> > > >> > I.e. the domain_crash() in viridian_start_apic_assist(). > >> > > >> > >> Yes, I'll have a look at that. > > > > No real clue about this as yet. It is odd that the guest has only set up one > > of the APIC assist pages and yet has taken an interrupt... > > > > Jan 16 01:46:05.691223 (XEN) Dumping guest's current state at > key_handler... > > Jan 16 01:46:05.691265 (XEN) Size of VMCB = 4096, paddr = > 000000020f7f7000, > > vaddr = ffff83020f7f7000 > > Jan 16 01:46:05.699269 (XEN) cr_intercepts = 0xfef3fef3 dr_intercepts = > > 0xffffffff exception_intercepts = 0x60082 > > Jan 16 01:46:05.707128 (XEN) general1_intercepts = 0xbdc4000f > > general2_intercepts = 0x2e7f > > Jan 16 01:46:05.715222 (XEN) iopm_base_pa = 0xdfd71000 msrpm_base_pa > = > > 0x20f7f4000 tsc_offset = 0xfffffc36684278c9 > > Jan 16 01:46:05.723116 (XEN) tlb_control = 0 vintr = 0x1020001 > > interrupt_shadow = 0 > > Jan 16 01:46:05.723153 (XEN) eventinj 000000008000002f, valid? 1, ec? 0, > > type 0, vector 0x2f > > Jan 16 01:46:05.731141 (XEN) exitcode = 0x64 exitintinfo = 0 > > Jan 16 01:46:05.739123 (XEN) exitinfo1 = 0 exitinfo2 = 0 > > Jan 16 01:46:05.739157 (XEN) np_enable = 0x1 guest_asid = 0x4b49 > > Jan 16 01:46:05.739187 (XEN) virtual vmload/vmsave = 0, virt_ext = 0 > > > > I'd expect it to have interrupts disabled at this point. Seemingly doesn't > > repro on Intel h/w (although I was testing with Win7 SP1 rather than RTM) > so > > I'll try to find some AMD h/w and try again. > > Well, it looks to be a random problem in the first place, or else we > would have known about the issue much earlier I think. I.e. I'm > not sure I see what you take the "AMD only" from here. > Well, I don't see any repro on Intel and the VMCB (rather than VMCS) dump identifies this case as using AMD h/w. I agree that it is random though, so it could indeed be purely coincidental. > As to interrupt state - isn't it quite normal for an OS to bring up > the BSP first, enable interrupts on it, and then bring up APs? > That's how we do it in Xen. What I meant was that I'd expect the guest to have interrupts disabled whilst poking the MSR to enable APIC assist on that CPU, since enabling APIC assist is clearly going to modify the way in which interrupts are handled. If that's not the case though then I guess that is probably the cause of the issue; I never really considered protecting interrupt handling against APIC assist being enabled on the same CPU. Paul > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.