[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xl/xm save -c fails - set_vcpucontext EOPNOTSUPP (was Re: [Xen-devel] xl save -c issues with Windows 7 Ultimate)



On Wed, May 11, 2011 at 1:37 PM, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
On Wed, May 11, 2011 at 2:47 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:
>>> On 11.05.11 at 04:30, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
>> I tried out a simple program that just gets and sets the VCPU 0's context
> (no change
> whatsoever to anything). There is no intermediate code involved (except for
> the hypercall
> bounce buffer stuff). If all is well, then this should work. But it doesnt!!
> even for a PV guest.
>  I get the same Operation Not supported error when I try to "set" the vcpu
> context with the
> same struct obtained via the get_vcpucontext hypercall!
>...
> and I get - setcontext: operation not supported!

Again, you'll want to add debugging code to the hypervisor to check
what really is inconsistent.

> now for the weirdness:
>  Since the the setcontext failed I thought I should be able
> to run the above sample code again and again with no side effect
> (please correct my assumption if I am wrong).
>
> But when I run the above code for the second time, I get a XEN panic!
>
> (XEN) Xen BUG at domctl.c:1724
> (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82c48014dd57>] arch_get_info_guest+0x5f7/0x7b0
> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: ffff8300228c4000   rcx: ffff8300228c4040
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff830450652210
> (XEN) rbp: ffff83082a357da8   rsp: ffff83082a357d68   r8:  0000000000000002
> (XEN) r9:  0000000000000002   r10: 0000000000000040   r11: 0000000000000000
> (XEN) r12: ffff830450652010   r13: 0000000000000001   r14: ffff830829db9000
> (XEN) r15: ffff830450652010   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 000000047beef000   cr2: 0000000000d44048
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83082a357d68:
> (XEN)    ffff830829db9000 ffff8300228c4000 ffff83082a357d98 fffffffffffffff4
> (XEN)    0000000000d40004 ffff8300228c4000 ffff830829db9000 ffff830450652010
> (XEN)    ffff83082a357ef8 ffff82c48010351f ffff83082a357e48 ffff82c48016af84
> (XEN)    0000000000000000 0000000000000070 ffff83082a357e28 000000000047beea
> (XEN)    0000000000000000 ffff83082a30b000 ffff830450652010 ffff830450652010
> (XEN)    ffff83082a357e48 0000000080164c7d aaaaaaaaaaaaaaaa ffff83082a30b000
> (XEN)    ffff83082a357ef8 ffff82c480113d73 000000070000000d 0000000000000001
> (XEN)    0000000000000000 0000000000d42004 0000000000000000 00007fef43c4a791
> (XEN)    0000000000000001 0000000000000000 00007fff27dc7db0 00007fef43a1bd58
> (XEN)    0000000000000024 0000000000000001 00007fff27dc9710 0000000000000001
> (XEN)    0000000000d3f050 00007fef43c51325 0000000000000011 00007fff27dc7dd0
> (XEN)    ffff83082a357ed8 ffff8300bf656000 0000000000000003 00007fff27dc7c60
> (XEN)    00007fff27dc7c60 0000000000000000 00007cf7d5ca80c7 ffff82c48020e1e8
> (XEN)    ffffffff8100948a 0000000000000024 0000000000000000 00007fff27dc7c60
> (XEN)    00007fff27dc7c60 0000000000000003 ffff8807a0f2fe68 ffffffff8148d700
> (XEN)    0000000000000282 0000000000000024 0000000000d3f050 0000000000d40004
> (XEN)    0000000000000024 ffffffff8100948a 0000000100000000 00007fff27dc7ce0
> (XEN)    0000000000d40004 0000010000000000 ffffffff8100948a 000000000000e033
> (XEN)    0000000000000282 ffff8807a0f2fe20 000000000000e02b 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000002
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48014dd57>] arch_get_info_guest+0x5f7/0x7b0
> (XEN)    [<ffff82c48010351f>] do_domctl+0x10ad/0x195e
> (XEN)    [<ffff82c48020e1e8>] syscall_enter+0xc8/0x122
>
> I would appreciate any pointers on how to go about this.

This now indeed looks like an inconsistency between
arch_get_info_guest() and the newly introduced error path in
arch_set_info_guest() - the code to put v->arch.user_eflags into
the necessary state now simply doesn't run anymore. It simply
needs to be pulled up in that function (and a few other adjustments
seem also necessary):

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -856,6 +856,15 @@ int arch_set_info_guest(
        goto out;
    }

+    init_int80_direct_trap(v);
+
+    /* IOPL privileges are virtualised. */
+    v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
+    v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
+
+    /* Ensure real hardware interrupts are enabled. */
+    v->arch.user_regs.eflags |= X86_EFLAGS_IF;
+
    if ( !v->is_initialised )
    {
        v->arch.pv_vcpu.ldt_base = c(ldt_base);
@@ -866,7 +875,11 @@ int arch_set_info_guest(
        bool_t fail = v->arch.pv_vcpu.ctrlreg[3] != c(ctrlreg[3]);

 #ifdef CONFIG_X86_64
-        fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
+        if ( !compat )
+        {
+            fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
+            fail |= !v->arch.pv_vcpu.ctrlreg[1] && !(flags & VGCF_in_kernel);
+        }
 #endif

        for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i )
@@ -907,15 +920,6 @@ int arch_set_info_guest(
    v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS;
    v->arch.pv_vcpu.ctrlreg[0] |= read_cr0() & ~X86_CR0_TS;

-    init_int80_direct_trap(v);
-
-    /* IOPL privileges are virtualised. */
-    v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
-    v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
-
-    /* Ensure real hardware interrupts are enabled. */
-    v->arch.user_regs.eflags |= X86_EFLAGS_IF;
-
    cr4 = v->arch.pv_vcpu.ctrlreg[4];
    v->arch.pv_vcpu.ctrlreg[4] = cr4 ? pv_guest_cr4_fixup(v, cr4) :
        real_cr4_to_pv_guest_cr4(mmu_cr4_features);

Can you give this a try?
Ok. This patch solves the Xen panic issue but not the EOPNOTSUPP
error. That is, I can use my sample program to "try" to get/set the same vcpu
context. As usual, only get context succeeded and set context failed with
same EOPNOTSUPP error, for 2.6.18 32-bit domU and 2.6.39 64 bit dom0

And as you said, I added more debugging.

(XEN) domain.c:893:d0 incoming cr3 42b33e000, cur cr3 827ba5000, fail = 1
(XEN) domain.c:901:d0 incoming cr1 42ba6c000, cur cr1 00000000, !(flags & VGCF_in_kernel)=0,fail=1

Looking at arch_get_info_guest in domctl.c , I see that cr3 is first copied verbatim from the vcpu and
then modified in the if-else block
if ( !is_pv_32on64_domain(v->domain) )
        {
            c.nat->ctrlreg[3] = xen_pfn_to_cr3(
                pagetable_get_pfn(v->arch.guest_table));
#ifdef __x86_64__
            c.nat->ctrlreg[1] =
                pagetable_is_null(v->arch.guest_table_user) ? 0
                : xen_pfn_to_cr3(pagetable_get_pfn(v->arch.guest_table_user));
#endif
....
   } else {
            l4_pgentry_t *l4e = __va(pagetable_get_paddr(v->arch.guest_table));
            c.cmp->ctrlreg[3] = compat_pfn_to_cr3(l4e_get_pfn(*l4e));
}

This seems to account for the difference in the values that libxc supplies (obtained from get context)
and the one validated against by arch_set_info_guest
 arch_set_context validates cr3 and cr1 against the wrong values (the vcpu.cr[1/3]) while it should
 be validated against the value that results from the operation done in the if-else loop in arch_get_info_guest

I have verified this too, with both a 32bit domU and 64bit domU.

64-bit PV domU (2.6.39..)
--------------------------------------
get_vcpu_context(): (debug output from arch_get_info_guest)
(XEN) domctl.c:1707:d0  copying cr1 00000000
(XEN) domctl.c:1707:d0  copying cr3 827bd5000
(XEN) domctl.c:1743:d0 not pv_32on64, outgoing cr3 42b85b000, cur cr3 827bd5000
(XEN) domctl.c:1746:d0 not pv_32on64, outgoing cr1 42b85c000, cur cr1 00000000

set_vcpu_context(): (debug output from arch_set_info_guest)
(XEN) domain.c:893:d0 incoming cr3 42b85b000, cur cr3 827bd5000, fail = 1
(XEN) domain.c:901:d0 incoming cr1 42b85c000, cur cr1 00000000, !(flags & VGCF_in_kernel)=0,fail=1

32-bit PV domU (2.6.18)
----------------------------------
get_vcpu_context()
(XEN) domctl.c:1707:d0 copying cr1 00000000
(XEN) domctl.c:1707:d0 copying cr3 2960e008
(XEN) domctl.c:1758:d0 is pv_32on64, outgoing cr3 4f0ac004, cur cr3 2960e008

set_vcpu_context()
(XEN) domain.c:893:d0 incoming cr3 4f0ac004, cur cr3 2960e008, fail = 1


shriram
corresponding code:

bool_t fail = v->arch.pv_vcpu.ctrlreg[3] != c(ctrlreg[3]);
gdprintk(XENLOG_WARNING,
            "incoming cr3 %08lx, cur cr3 %08lx, fail = %d\n",
             c(ctrlreg[3]), v->arch.pv_vcpu.ctrlreg[3], fail);

#ifdef CONFIG_X86_64

if ( !compat )
{
      fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
      gdprintk(XENLOG_WARNING,
                "incoming cr1 %08lx, cur cr1 %08lx, !(flags & VGCF_in_kernel)=%d,fail=%d\n",
                 c(ctrlreg[1]), v->arch.pv_vcpu.ctrlreg[1], !(flags & VGCF_in_kernel),fail);

      fail |= !v->arch.pv_vcpu.ctrlreg[1] && !(flags & VGCF_in_kernel);
...

shriram

The question is whether there are other inconsistencies lurking, and
hence whether it wouldn't be better to mark a vCPU on which setting
the context failed, not allowing it to resume or have its context
obtained anymore. That appears quite drastic though - Keir, what's
your opinion here?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.