Xen project Mailing List

On Wed, May 11, 2011 at 1:37 PM, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:

On Wed, May 11, 2011 at 2:47 AM, Jan Beulich <JBeulich@xxxxxxxxxx> wrote:

>>> On 11.05.11 at 04:30, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
>> I tried out a simple program that just gets and sets the VCPU 0's context
> (no change
> whatsoever to anything). There is no intermediate code involved (except for
> the hypercall
> bounce buffer stuff). If all is well, then this should work. But it doesnt!!
> even for a PV guest.
> I get the same Operation Not supported error when I try to "set" the vcpu
> context with the
> same struct obtained via the get_vcpucontext hypercall!

>...

> and I get - setcontext: operation not supported!

Again, you'll want to add debugging code to the hypervisor to check
what really is inconsistent.

> now for the weirdness:
> Since the the setcontext failed I thought I should be able
> to run the above sample code again and again with no side effect
> (please correct my assumption if I am wrong).
>
> But when I run the above code for the second time, I get a XEN panic!
>
> (XEN) Xen BUG at domctl.c:1724
> (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]----
> (XEN) CPU: 2
> (XEN) RIP: e008:[<ffff82c48014dd57>] arch_get_info_guest+0x5f7/0x7b0
> (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor
> (XEN) rax: 0000000000000001 rbx: ffff8300228c4000 rcx: ffff8300228c4040
> (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff830450652210
> (XEN) rbp: ffff83082a357da8 rsp: ffff83082a357d68 r8: 0000000000000002
> (XEN) r9: 0000000000000002 r10: 0000000000000040 r11: 0000000000000000
> (XEN) r12: ffff830450652010 r13: 0000000000000001 r14: ffff830829db9000
> (XEN) r15: ffff830450652010 cr0: 0000000080050033 cr4: 00000000000026f0
> (XEN) cr3: 000000047beef000 cr2: 0000000000d44048
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff83082a357d68:
> (XEN) ffff830829db9000 ffff8300228c4000 ffff83082a357d98 fffffffffffffff4
> (XEN) 0000000000d40004 ffff8300228c4000 ffff830829db9000 ffff830450652010
> (XEN) ffff83082a357ef8 ffff82c48010351f ffff83082a357e48 ffff82c48016af84
> (XEN) 0000000000000000 0000000000000070 ffff83082a357e28 000000000047beea
> (XEN) 0000000000000000 ffff83082a30b000 ffff830450652010 ffff830450652010
> (XEN) ffff83082a357e48 0000000080164c7d aaaaaaaaaaaaaaaa ffff83082a30b000
> (XEN) ffff83082a357ef8 ffff82c480113d73 000000070000000d 0000000000000001
> (XEN) 0000000000000000 0000000000d42004 0000000000000000 00007fef43c4a791
> (XEN) 0000000000000001 0000000000000000 00007fff27dc7db0 00007fef43a1bd58
> (XEN) 0000000000000024 0000000000000001 00007fff27dc9710 0000000000000001
> (XEN) 0000000000d3f050 00007fef43c51325 0000000000000011 00007fff27dc7dd0
> (XEN) ffff83082a357ed8 ffff8300bf656000 0000000000000003 00007fff27dc7c60
> (XEN) 00007fff27dc7c60 0000000000000000 00007cf7d5ca80c7 ffff82c48020e1e8
> (XEN) ffffffff8100948a 0000000000000024 0000000000000000 00007fff27dc7c60
> (XEN) 00007fff27dc7c60 0000000000000003 ffff8807a0f2fe68 ffffffff8148d700
> (XEN) 0000000000000282 0000000000000024 0000000000d3f050 0000000000d40004
> (XEN) 0000000000000024 ffffffff8100948a 0000000100000000 00007fff27dc7ce0
> (XEN) 0000000000d40004 0000010000000000 ffffffff8100948a 000000000000e033
> (XEN) 0000000000000282 ffff8807a0f2fe20 000000000000e02b 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000002
> (XEN) Xen call trace:
> (XEN) [<ffff82c48014dd57>] arch_get_info_guest+0x5f7/0x7b0
> (XEN) [<ffff82c48010351f>] do_domctl+0x10ad/0x195e
> (XEN) [<ffff82c48020e1e8>] syscall_enter+0xc8/0x122
>
> I would appreciate any pointers on how to go about this.

This now indeed looks like an inconsistency between
arch_get_info_guest() and the newly introduced error path in
arch_set_info_guest() - the code to put v->arch.user_eflags into
the necessary state now simply doesn't run anymore. It simply
needs to be pulled up in that function (and a few other adjustments
seem also necessary):

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -856,6 +856,15 @@ int arch_set_info_guest(
goto out;
}

+ init_int80_direct_trap(v);
+
+ /* IOPL privileges are virtualised. */
+ v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
+ v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
+
+ /* Ensure real hardware interrupts are enabled. */
+ v->arch.user_regs.eflags |= X86_EFLAGS_IF;
+
if ( !v->is_initialised )
{

v->arch.pv_vcpu.ldt_base = c(ldt_base);

@@ -866,7 +875,11 @@ int arch_set_info_guest(

bool_t fail = v->arch.pv_vcpu.ctrlreg[3] != c(ctrlreg[3]);

#ifdef CONFIG_X86_64
- fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);

+ if ( !compat )
+ {
+ fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);
+ fail |= !v->arch.pv_vcpu.ctrlreg[1] && !(flags & VGCF_in_kernel);
+ }

#endif

for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i )

@@ -907,15 +920,6 @@ int arch_set_info_guest(
v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS;
v->arch.pv_vcpu.ctrlreg[0] |= read_cr0() & ~X86_CR0_TS;

- init_int80_direct_trap(v);
-
- /* IOPL privileges are virtualised. */
- v->arch.pv_vcpu.iopl = (v->arch.user_regs.eflags >> 12) & 3;
- v->arch.user_regs.eflags &= ~X86_EFLAGS_IOPL;
-
- /* Ensure real hardware interrupts are enabled. */
- v->arch.user_regs.eflags |= X86_EFLAGS_IF;
-
cr4 = v->arch.pv_vcpu.ctrlreg[4];
v->arch.pv_vcpu.ctrlreg[4] = cr4 ? pv_guest_cr4_fixup(v, cr4) :
real_cr4_to_pv_guest_cr4(mmu_cr4_features);

Can you give this a try?
Ok. This patch solves the Xen panic issue but not the EOPNOTSUPP
error. That is, I can use my sample program to "try" to get/set the same vcpu
context. As usual, only get context succeeded and set context failed with
same EOPNOTSUPP error, for 2.6.18 32-bit domU and 2.6.39 64 bit dom0

And as you said, I added more debugging.

(XEN) domain.c:893:d0 incoming cr3 42b33e000, cur cr3 827ba5000, fail = 1
(XEN) domain.c:901:d0 incoming cr1 42ba6c000, cur cr1 00000000, !(flags & VGCF_in_kernel)=0,fail=1

Looking at arch_get_info_guest in domctl.c , I see that cr3 is first copied verbatim from the vcpu and
then modified in the if-else block
if ( !is_pv_32on64_domain(v->domain) )
        {
            c.nat->ctrlreg[3] = xen_pfn_to_cr3(
                pagetable_get_pfn(v->arch.guest_table));
#ifdef __x86_64__
            c.nat->ctrlreg[1] =
                pagetable_is_null(v->arch.guest_table_user) ? 0
                : xen_pfn_to_cr3(pagetable_get_pfn(v->arch.guest_table_user));
#endif
....
   } else {
            l4_pgentry_t *l4e = __va(pagetable_get_paddr(v->arch.guest_table));
            c.cmp->ctrlreg[3] = compat_pfn_to_cr3(l4e_get_pfn(*l4e));
}

This seems to account for the difference in the values that libxc supplies (obtained from get context)
and the one validated against by arch_set_info_guest
arch_set_context validates cr3 and cr1 against the wrong values (the vcpu.cr[1/3]) while it should
be validated against the value that results from the operation done in the if-else loop in arch_get_info_guest

I have verified this too, with both a 32bit domU and 64bit domU.

64-bit PV domU (2.6.39..)
--------------------------------------
get_vcpu_context(): (debug output from arch_get_info_guest)
(XEN) domctl.c:1707:d0 copying cr1 00000000
(XEN) domctl.c:1707:d0 copying cr3 827bd5000
(XEN) domctl.c:1743:d0 not pv_32on64, outgoing cr3 42b85b000, cur cr3 827bd5000
(XEN) domctl.c:1746:d0 not pv_32on64, outgoing cr1 42b85c000, cur cr1 00000000

set_vcpu_context(): (debug output from arch_set_info_guest)
(XEN) domain.c:893:d0 incoming cr3 42b85b000, cur cr3 827bd5000, fail = 1
(XEN) domain.c:901:d0 incoming cr1 42b85c000, cur cr1 00000000, !(flags & VGCF_in_kernel)=0,fail=1

32-bit PV domU (2.6.18)
----------------------------------
get_vcpu_context()
(XEN) domctl.c:1707:d0 copying cr1 00000000
(XEN) domctl.c:1707:d0 copying cr3 2960e008
(XEN) domctl.c:1758:d0 is pv_32on64, outgoing cr3 4f0ac004, cur cr3 2960e008

set_vcpu_context()
(XEN) domain.c:893:d0 incoming cr3 4f0ac004, cur cr3 2960e008, fail = 1

shriram

corresponding code:

bool_t fail = v->arch.pv_vcpu.ctrlreg[3] != c(ctrlreg[3]);
gdprintk(XENLOG_WARNING,
            "incoming cr3 %08lx, cur cr3 %08lx, fail = %d\n",
             c(ctrlreg[3]), v->arch.pv_vcpu.ctrlreg[3], fail);

#ifdef CONFIG_X86_64

if ( !compat )
{
      fail |= v->arch.pv_vcpu.ctrlreg[1] != c(ctrlreg[1]);

      gdprintk(XENLOG_WARNING,
                "incoming cr1 %08lx, cur cr1 %08lx, !(flags & VGCF_in_kernel)=%d,fail=%d\n",
                 c(ctrlreg[1]), v->arch.pv_vcpu.ctrlreg[1], !(flags & VGCF_in_kernel),fail);

      fail |= !v->arch.pv_vcpu.ctrlreg[1] && !(flags & VGCF_in_kernel);
...

shriram

The question is whether there are other inconsistencies lurking, and
hence whether it wouldn't be better to mark a vCPU on which setting
the context failed, not allowing it to resume or have its context
obtained anymore. That appears quite drastic though - Keir, what's
your opinion here?

Jan

Re: xl/xm save -c fails - set_vcpucontext EOPNOTSUPP (was Re: [Xen-devel] xl save -c issues with Windows 7 Ultimate)