Xen project Mailing List

Re: [Xen-devel] rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

From: Andy Lutomirski <luto@xxxxxxxxxxxxxx>

Date: Thu, 17 Sep 2015 13:23:31 -0700

Cc: xen <xen@xxxxxxxxxxxxxxxxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, kvm list <kvm@xxxxxxxxxxxxxxx>, Cole Robinson <crobinso@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, M A Young <m.a.young@xxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>

Delivery-date: Thu, 17 Sep 2015 20:24:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Sep 17, 2015 at 1:10 PM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: > On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote: >> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote: >> > On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote: >> >> On 09/16/2015 04:07 PM, M A Young wrote: >> >>> On Wed, 16 Sep 2015, Cole Robinson wrote: >> >>> >> >>>> Unfortunately I couldn't get anything else extra out of xen using any >> >>>> of these >> >>>> options or the ones Major recommended... in fact I couldn't get >> >>>> anything to >> >>>> the serial console at all. console=con1 would seem to redirect messages >> >>>> since >> >>>> they wouldn't show up on the graphical display, but nothing went to the >> >>>> serial >> >>>> log. Maybe I'm missing something... >> >>> >> >>> That should be console=com1 so you have a typo either in this message or >> >>> in your tests. >> >>> >> >> >> >> Yeah that was it :/ So here's the crash output use -cpu host: >> >> >> >> - Cole >> >> >> >> <snip> >> >> >> about to get started... >> >> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap >> >> [#13] on >> >> VCPU 0 [ec=0000] >> >> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3 >> >> create_bounce_frame+0x12b/0x13a >> >> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >> >> (XEN) ----[ Xen-4.5.1 x86_64 debug=n Not tainted ]---- >> >> (XEN) CPU: 0 >> >> (XEN) RIP: e033:[<ffffffff810032b0>] >> > >> > That is the Linux kernel EIP. Can you figure out what is at >> > ffffffff810032b0 ? >> > >> > gdb vmlinux and then >> > x/20i 0xffffffff810032b0 >> > >> > can help with that. >> > >> >> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now: >> >> about to get started... >> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on >> VCPU 0 [ec=0000] What exactly does this mean? >> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3 >> create_bounce_frame+0x12b/0x13a >> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >> (XEN) ----[ Xen-4.5.1 x86_64 debug=n Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e033:[<ffffffff810031f0>] >> (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest >> (XEN) rax: 0000000000000015 rbx: ffffffff81c03e1c rcx: 00000000c0010112 >> (XEN) rdx: 0000000000000001 rsi: ffffffff81c03e1c rdi: 00000000c0010112 >> (XEN) rbp: ffffffff81c03df8 rsp: ffffffff81c03da0 r8: ffffffff81c03e28 >> (XEN) r9: ffffffff81c03e2c r10: 0000000000000000 r11: 00000000ffffffff >> (XEN) r12: ffffffff81d25a60 r13: 0000000004000000 r14: 0000000000000000 >> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000406f0 >> (XEN) cr3: 0000000075c0b000 cr2: 0000000000000000 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 >> (XEN) Guest stack trace from rsp=ffffffff81c03da0: >> (XEN) 00000000c0010112 00000000ffffffff 0000000000000000 ffffffff810031f0 >> (XEN) 000000010000e030 0000000000010082 ffffffff81c03de0 000000000000e02b >> (XEN) 0000000000000000 000000000000000c ffffffff81c03e1c ffffffff81c03e48 >> (XEN) ffffffff8102a7a4 ffffffff81c03e48 ffffffff8102aa3b ffffffff81c03e48 >> (XEN) cf1fa5f5e026f464 0000000001000000 ffffffff81c03ef8 0000000004000000 >> (XEN) 0000000000000000 ffffffff81c03e58 ffffffff81d5d142 ffffffff81c03ee8 >> (XEN) ffffffff81d58b56 0000000000000000 0000000000000000 ffffffff81c03e88 >> (XEN) ffffffff810f8a39 ffffffff81c03ee8 ffffffff81798b13 ffffffff00000010 >> (XEN) ffffffff81c03ef8 ffffffff81c03eb8 cf1fa5f5e026f464 ffffffff81f1de9c >> (XEN) ffffffffffffffff 0000000000000000 ffffffff81df7920 0000000000000000 >> (XEN) 0000000000000000 ffffffff81c03f28 ffffffff81d51c74 cf1fa5f5e026f464 >> (XEN) 0000000000000000 ffffffff81c03f60 ffffffff81c03f5c 0000000000000000 >> (XEN) 0000000000000000 ffffffff81c03f38 ffffffff81d51339 ffffffff81c03ff8 >> (XEN) ffffffff81d548b1 0000000000000000 00600f1200000000 0000000100000800 >> (XEN) 0300000100000032 0000000000000005 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc >> (XEN) Domain 0 crashed: rebooting machine in 5 seconds. >> >> >> gdb output: >> >> (gdb) x/20i 0xffffffff810031f0 >> 0xffffffff810031f0 <xen_read_msr_safe+16>: rdmsr > > Fantastic! So we have some rdmsr that makes KVM inject an > GP. What's the scenario? Is this Xen on KVM? Why didn't the guest print anything? Is the issue here that the guest died due to failure to handle an RDMSR failure or did the *hypervisor* die? It looks like null_trap_bounce is returning true, which suggests that the failure is happening before the guest sets up exception handling. > > Looking at the stack you have some other values: > ffffffff81c03de0, ffffffff81c03e1c .. they should correspond > to other functions calling this one. If you do 'nm --defined vmlinux | grep > ffffffff81c03e1' > that should give an idea where they are. Or use 'gdb'. > > That will give us an stack - and we can find what type of MSR > this is. Oh wait, it is on the registers: 00000000c0010112 > > Ok, so where in the code is that MSR ah, that looks to be: > #define MSR_K8_TSEG_ADDR 0xc0010112 > > which is called at bsp_init_amd. > > I think the problem here is that we are calling the > 'safe' variant of MSR but we still get an injected #GP and > don't expect that. > > I am not really sure what the expected outcome should be here. > > CC-ing xen-devel, KVM folks, and Andy who has been looking > in mucking around in the _safe* pvops. It's too early of a failure, I think. Cc: Borislav. Is TSEG guaranteed to exist? Can we defer that until we have exception handling working? Do we need to rig up exception handling so that it works earlier (e.g. in early_trap_init, which is presumably early enough)? Or is this just a KVM and/or Xen bug. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.