Xen project Mailing List

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault

To: <andrew.cooper3@xxxxxxxxxx>, <JBeulich@xxxxxxxx>

From: <Kevin.Mayer@xxxxxxxx>

Date: Wed, 7 Sep 2016 08:35:42 +0000

Accept-language: de-DE, en-US

Cc: xen-devel@xxxxxxxxxxxxx

Delivery-date: Wed, 07 Sep 2016 08:35:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHR8V/Bf9t6T5jYSE6HlCkrclbAiaBQGgBggAS6loCAGQkB8A==

Thread-topic: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault

Hi I took the time to write a small script which restores and destroys domains from provided state files. Just apply the patch to a xen 4.6.1, provide some images + state files and start the script. python VmStarter.py -FILE /path/to/domU-0.state -FILE /path/to/domU-1.state --loggingLevel DEBUG You can provide an arbitrary amount of state files and the script will start an additional thread for each one. Each thread restores one guest domain from the provided state file, waits for a random time between 20 and 30 seconds (sleepTime = random.randint(20,30) ) , destroys the domain and then starts the process again. The guest domains and the corresponding state files need to have the same name since the script extracts the domain name from the state file name. When starting about one guest domain for every physical core of the CPU the crash should occur in 5 to 10 minutes. Since the crashes are pretty random the hypervisor sometimes panics almost instantly and sometimes it takes a while, but it seems to correlate with the amount of started guest domains. More domains => faster crash Kevin > -----Ursprüngliche Nachricht----- > Von: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Gesendet: Montag, 22. August 2016 13:58 > An: Mayer, Kevin <Kevin.Mayer@xxxxxxxx>; JBeulich@xxxxxxxx > Cc: xen-devel@xxxxxxxxxxxxx > Betreff: Re: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault > > On 19/08/16 11:01, Kevin.Mayer@xxxxxxxx wrote: > > Hi > > > > I took another look at Xen and a new crashdump. > > The last successful __vmwrite should be in static void > > vmx_vcpu_update_vmfunc_ve(struct vcpu *v) [...] > > __vmwrite(SECONDARY_VM_EXEC_CONTROL, > > v->arch.hvm_vmx.secondary_exec_control); > > [...] > > After this the altp2m_vcpu_destroy wakes up the vcpu and is then > finished. > > > > In nestedhvm_vcpu_destroy (nvmx_vcpu_destroy) the vmcs can > overwritten (but is not reached in our case as far as I can see): > > if ( nvcpu->nv_n1vmcx ) > > v->arch.hvm_vmx.vmcs = nvcpu->nv_n1vmcx; > > > > In conclusion: > > When destroying a domain the altp2m_vcpu_destroy(v); path seems to > mess up the vmcs which ( only ) sometimes leads to a failed __vmwrite in > vmx_fpu_leave. > > That is as far as I can get with my understanding of the Xen code. > > > > Do you guys have any additional ideas what I could test / analyse? > > Do you have easy reproduction instructions you could share? Sadly, this is > looking like an issue which isn't viable to debug over email. > > ~Andrew ____________ Virus checked by G Data MailSecurity Version: AVA 25.8183 dated 07.09.2016 Virus news: www.antiviruslab.com

Attachment: xen-altp2menable.patch
Description: xen-altp2menable.patch

Attachment: VmStarter.py
Description: VmStarter.py

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.