[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default



On 29/07/16 08:33, Kevin.Mayer@xxxxxxxx wrote:

Hi guys

 

We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers.

We start dozens of VMs and destroy them again after 60 seconds, which works fine as it is, but the next step in our approach requires the use of the altp2m functionality.

Since libvirt does not pass the altp2m-enable flag to the hypervisor we enabled altp2m unconditionally by patching the hvm.c . Since all of our machines support the altp2m this seemed to be ok.


altp2m is emulated in software when hardware support isn't available, so it should work on all hardware (albeit with rather higher overhead).

 

     d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;

     d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot;

+    d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1;

+


This looks to be ok, given your situation.

     vpic_init(d);

     rc = vioapic_init(d);

 

Since applying this patch the hypervisor crashes after several hundred restarted VMs (without any altp2m-functionality used by us) with the following dmesg:

 

(XEN) ----[ Xen-4.6.1  x86_64  debug=n  Not tainted ]----


As a start, please always use a debug hypervisor for investigating issues like this.

(XEN) CPU:    7

(XEN) RIP:    e008:[<ffff82d0801f5a55>] vmx_vmenter_helper+0x2b5/0x340

(XEN) RFLAGS: 0000000000010003   CONTEXT: hypervisor (d0v3)

(XEN) rax: 000000008005003b   rbx: ffff8300e7038000   rcx: 0000000000000008

(XEN) rdx: 0000000000006c00   rsi: ffff83062eb5e000   rdi: ffff8300e7038000

(XEN) rbp: ffff830c17e3f000   rsp: ffff830617fc7d70   r8:  0000000000000000

(XEN) r9:  ffff83014f8d7028   r10: 000002700f858000   r11: 00002201be6861f0

(XEN) r12: ffff83062eb5e000   r13: ffff8300e752f000   r14: ffff82d08030ea40

(XEN) r15: 0000000000000007   cr0: 000000008005003b   cr4: 00000000000026e0

(XEN) cr3: 00000001bf4da000   cr2: 00000000dd840c00

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

(XEN) Xen stack trace from rsp=ffff830617fc7d70:

(XEN)    ffff8300e7038000 ffff82d080170c04 0000000000000000 0000000780109f6a

(XEN)    ffff830617fc7f18 ffff83000000001e 0000000000000000 ffff8300e752f19c

(XEN)    0000000000000286 ffff8300e752f000 ffff8300e72fc000 0000000000000007

(XEN)    ffff830c17e3f000 ffff830c14ee1000 ffff82d08030ea40 ffff82d080173d6a

(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000

(XEN)    ffff82d08030ea40 ffff8300e72fc000 000002700f481091 0000000000000001

(XEN)    ffff82d080324560 ffff82d08030ea40 ffff8300e752f000 ffff82d080128004

(XEN)    0000000000000001 0000000001c9c380 ffff830c14ef60e8 0000000017fce600

(XEN)    0000000000000001 ffff82d0801bd18b ffff82d0801d9e88 ffff8300e752f000

(XEN)    0000000001c9c380 ffff82d08012e700 0000006e00000171 ffffffffffffffff

(XEN)    ffff830617fc0000 ffff82d0802f8f80 00000000ffffffff ffff83062eb5e000

(XEN)    ffff82d08030ea40 ffff82d08012b040 ffff8300e7038000 ffff830617fc0000

(XEN)    ffff8300e7038000 00000000ffffffff ffff830c14ee1000 ffff82d080170970

(XEN)    ffff8300e72fc000 0000000000000000 0000000000000000 0000000000000000

(XEN)    0000000000000000 0000000080550f50 00000000ffdffc70 0000000000000000

(XEN)    0000000000000000 0000000000000000 0000000000000000 000000002fcffe19

(XEN)    00000000ffdffc70 0000000000000000 00000000ffdffc50 00000000853b0918

(XEN)    000000fa00000000 00000000f0e48162 0000000000000000 0000000000000246

(XEN)    0000000080550f34 0000000000000000 0000000000000000 0000000000000000

(XEN)    0000000000000000 0000000000000000 0000000000000007 ffff8300e752f000

(XEN) Xen call trace:

(XEN)    [<ffff82d0801f5a55>] vmx_vmenter_helper+0x2b5/0x340

(XEN)    [<ffff82d080170c04>] __context_switch+0xb4/0x350

(XEN)    [<ffff82d080173d6a>] context_switch+0xca/0xef0

(XEN)    [<ffff82d080128004>] schedule+0x264/0x5f0

(XEN)    [<ffff82d0801bd18b>] mwait_idle+0x25b/0x3a0

(XEN)    [<ffff82d0801d9e88>] hvm_vcpu_has_pending_irq+0x58/0xc0

(XEN)    [<ffff82d08012e700>] timer_softirq_action+0x80/0x250

(XEN)    [<ffff82d08012b040>] __do_softirq+0x60/0x90

(XEN)    [<ffff82d080170970>] idle_loop+0x20/0x50

(XEN)

(XEN)

(XEN) ****************************************

(XEN) Panic on CPU 7:

(XEN) FATAL TRAP: vector = 6 (invalid opcode)

(XEN) ****************************************

(XEN)

(XEN) Reboot in five seconds...

(XEN) Executing kexec image on cpu7

(XEN) Shot down all CPUs

 

The RIP points to ud2

0xffff82d0801f5a55:  ud2

From the RFLAGS we concluded that the vmwrite failed due to an invalid vmcs-pointer (CF = 1), but this is where we are stuck since we have no idea how the pointer could have gotten corrupted.

crash> vcpu

gives vmcs = 0xffffffff817cbc20 for vcpu_id = 7,

 

and vcpus gives

 

   VCID  PCID       VCPU       ST T DOMID      DOMAIN

      0     0 ffff8300e75f2000 RU I 32767 ffff830c14ee1000

      1     1 ffff8300e72fe000 RU I 32767 ffff830c14ee1000

      2     2 ffff8300e7527000 RU I 32767 ffff830c14ee1000

>     3     3 ffff8300e7526000 RU I 32767 ffff830c14ee1000

      4     4 ffff8300e75f1000 RU I 32767 ffff830c14ee1000

>     5     5 ffff8300e75f0000 RU I 32767 ffff830c14ee1000

>     6     6 ffff8300e72fd000 RU I 32767 ffff830c14ee1000

      7     7 ffff8300e72fc000 RU I 32767 ffff830c14ee1000

      0     0 ffff8300e72fa000 BL 0     0 ffff830c17e3f000

      1     6 ffff8300e72f9000 BL 0     0 ffff830c17e3f000

      2     3 ffff8300e72f8000 BL 0     0 ffff830c17e3f000

>     3     7 ffff8300e752f000 RU 0     0 ffff830c17e3f000

      4     5 ffff8300e752e000 RU 0     0 ffff830c17e3f000

>     5     2 ffff8300e752d000 RU 0     0 ffff830c17e3f000

>     6     1 ffff8300e752c000 BL 0     0 ffff830c17e3f000

>*    7     0 ffff8300e752b000 RU 0     0 ffff830c17e3f000

      0     4 ffff8300e7042000 OF U   127 ffff830475bbe000

>     0     4 ffff8300e7040000 RU U   128 ffff83062a7bc000

      0     1 ffff8300e7038000 RU U   129 ffff83062eb5e000

     0     5 ffff8300e703e000 BL U   130 ffff830475bd1000

 

Do you have any ideas what could cause this crash or how to proceed?


As a start, use a debug hypervisor.  That will get you accurate backtraces, and you might get lucky and hit an earlier assertion.  Can you identify which domain this vmcs should belong to, and whether it is in the process of being destroyed?

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.