[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault
Hi I took the time to write a small script which restores and destroys domains from provided state files. Just apply the patch to a xen 4.6.1, provide some images + state files and start the script. python VmStarter.py -FILE /path/to/domU-0.state -FILE /path/to/domU-1.state --loggingLevel DEBUG You can provide an arbitrary amount of state files and the script will start an additional thread for each one. Each thread restores one guest domain from the provided state file, waits for a random time between 20 and 30 seconds (sleepTime = random.randint(20,30) ) , destroys the domain and then starts the process again. The guest domains and the corresponding state files need to have the same name since the script extracts the domain name from the state file name. When starting about one guest domain for every physical core of the CPU the crash should occur in 5 to 10 minutes. Since the crashes are pretty random the hypervisor sometimes panics almost instantly and sometimes it takes a while, but it seems to correlate with the amount of started guest domains. More domains => faster crash Kevin > -----Ursprüngliche Nachricht----- > Von: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Gesendet: Montag, 22. August 2016 13:58 > An: Mayer, Kevin <Kevin.Mayer@xxxxxxxx>; JBeulich@xxxxxxxx > Cc: xen-devel@xxxxxxxxxxxxx > Betreff: Re: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault > > On 19/08/16 11:01, Kevin.Mayer@xxxxxxxx wrote: > > Hi > > > > I took another look at Xen and a new crashdump. > > The last successful __vmwrite should be in static void > > vmx_vcpu_update_vmfunc_ve(struct vcpu *v) [...] > > __vmwrite(SECONDARY_VM_EXEC_CONTROL, > > v->arch.hvm_vmx.secondary_exec_control); > > [...] > > After this the altp2m_vcpu_destroy wakes up the vcpu and is then > finished. > > > > In nestedhvm_vcpu_destroy (nvmx_vcpu_destroy) the vmcs can > overwritten (but is not reached in our case as far as I can see): > > if ( nvcpu->nv_n1vmcx ) > > v->arch.hvm_vmx.vmcs = nvcpu->nv_n1vmcx; > > > > In conclusion: > > When destroying a domain the altp2m_vcpu_destroy(v); path seems to > mess up the vmcs which ( only ) sometimes leads to a failed __vmwrite in > vmx_fpu_leave. > > That is as far as I can get with my understanding of the Xen code. > > > > Do you guys have any additional ideas what I could test / analyse? > > Do you have easy reproduction instructions you could share? Sadly, this is > looking like an issue which isn't viable to debug over email. > > ~Andrew ____________ Virus checked by G Data MailSecurity Version: AVA 25.8183 dated 07.09.2016 Virus news: www.antiviruslab.com Attachment:
xen-altp2menable.patch Attachment:
VmStarter.py _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |