[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault


  • To: <andrew.cooper3@xxxxxxxxxx>, <JBeulich@xxxxxxxx>
  • From: <Kevin.Mayer@xxxxxxxx>
  • Date: Mon, 22 Aug 2016 12:22:48 +0000
  • Accept-language: de-DE, en-US
  • Cc: xen-devel@xxxxxxxxxxxxx
  • Delivery-date: Mon, 22 Aug 2016 12:23:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHR8V/Bf9t6T5jYSE6HlCkrclbAiaBQGgBggAS6loCAACLlYA==
  • Thread-topic: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault

Hi

The reproduction should be pretty simple:

Apply the patch to enable altp2m unconditionally:
     d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
     d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot;
+    d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1;
+
     vpic_init(d);
     rc = vioapic_init(d);

For the guest we use one state file ( Windows 10 ) from which the guests are 
restored with libvirt.
Simply restore and destroy several guests (5-7 in our current setup) in fast 
succession (every guest has about 1-2minutes runtime).
The amount of guest-VMs seems to correlate with the time until the crash 
occurs, but other, random factors seem to be more important.
More VMs => the crash happens faster.


Is the following debug-setup possible?
L0: Xen / VMWare
L1: Xen with altp2m enabled
L2: Several guest-VMs being constantly restored / destroyed

Then periodically take snapshots until the hypervisor panics and try to debug 
from the latest snapshot on.

> -----Ursprüngliche Nachricht-----
> Von: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Gesendet: Montag, 22. August 2016 13:58
> An: Mayer, Kevin <Kevin.Mayer@xxxxxxxx>; JBeulich@xxxxxxxx
> Cc: xen-devel@xxxxxxxxxxxxx
> Betreff: Re: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault
> 
> On 19/08/16 11:01, Kevin.Mayer@xxxxxxxx wrote:
> > Hi
> >
> > I took another look at Xen and a new crashdump.
> > The last successful __vmwrite should be in static void
> > vmx_vcpu_update_vmfunc_ve(struct vcpu *v) [...]
> >     __vmwrite(SECONDARY_VM_EXEC_CONTROL,
> >               v->arch.hvm_vmx.secondary_exec_control);
> > [...]
> > After this the altp2m_vcpu_destroy wakes up the vcpu and is then
> finished.
> >
> > In nestedhvm_vcpu_destroy (nvmx_vcpu_destroy) the vmcs can
> overwritten (but is not reached in our case as far as I can see):
> >     if ( nvcpu->nv_n1vmcx )
> >         v->arch.hvm_vmx.vmcs = nvcpu->nv_n1vmcx;
> >
> > In conclusion:
> > When destroying a domain the altp2m_vcpu_destroy(v); path seems to
> mess up the vmcs which ( only ) sometimes leads to a failed __vmwrite in
> vmx_fpu_leave.
> > That is as far as I can get with my understanding of the Xen code.
> >
> > Do you guys have any additional ideas what I could test / analyse?
> 
> Do you have easy reproduction instructions you could share?  Sadly, this is
> looking like an issue which isn't viable to debug over email.
> 
> ~Andrew
____________
Virus checked by G Data MailSecurity
Version: AVA 25.7981 dated 22.08.2016
Virus news: www.antiviruslab.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.