Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/vvmx: Fix deadlock with MSR bitmap merging

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 12 Mar 2020 14:32:27 +0100

Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=roger.pau@xxxxxxxxxx; spf=Pass smtp.mailfrom=roger.pau@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Thu, 12 Mar 2020 13:32:45 +0000

Ironport-sdr: DSnxiwTBLc/yPiYb4UtWh3Fx6Cx1nugvzD6SAfD6HUz1njXRg2XDtph+t0zdmPT75nY6sQ9aJr xlBvNsxZHAWMFiM3HL7mkl7V288gbwq3YbOrQ9M4+rM/8XxRagI0q1Q+2eMHU0QudWGrJSVyd6 6aX/KBzWfpQK/e2Sy9frqDXhEYj1uN3pUT/sjHSkE/pVx/cJraXGaFeZKzhc0/BpST7y6agRme QQpJQkkACtuRQBV9vbr3FyZDryzGy2MxCKCPGw5hXfHS8dUlOJGOnL7YCz/zTORokNJ7j1FPrH 0qk=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Mar 12, 2020 at 12:21:29PM +0000, Andrew Cooper wrote: > On 12/03/2020 09:21, Jan Beulich wrote: > > On 11.03.2020 19:34, Andrew Cooper wrote: > >> c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a > >> use of > >> map_domain_page() which may get used in the middle of context switch. > >> > >> This is not safe, and causes Xen to deadlock on the mapcache lock: > >> > >> (XEN) Xen call trace: > >> (XEN) [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e > >> (XEN) [<ffff82d0803219d7>] F map_domain_page+0x250/0x527 > >> (XEN) [<ffff82d080356332>] F do_page_fault+0x420/0x780 > >> (XEN) [<ffff82d08038da3d>] F > >> x86_64/entry.S#handle_exception_saved+0x68/0x94 > >> (XEN) [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69 > >> (XEN) [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527 > >> (XEN) [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323 > >> (XEN) [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40 > >> (XEN) [<ffff82d08029a3f7>] F > >> arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121 > >> (XEN) [<ffff82d08031d796>] F > >> arch/x86/domain.c#__context_switch+0x124/0x4a9 > >> (XEN) [<ffff82d080320925>] F context_switch+0x154/0x62c > >> (XEN) [<ffff82d080252f3e>] F > >> common/sched/core.c#sched_context_switch+0x16a/0x175 > >> (XEN) [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc > >> (XEN) [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8 > >> (XEN) [<ffff82d08022cd38>] F do_softirq+0x18/0x1a > >> (XEN) [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30 > >> > >> Convert the domheap page into being a xenheap page. > >> > >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> > > > >> I suspect this is the not-quite-consistent-enough-to-bisect issue which > >> OSSTest is hitting and interfering with pushes to master. > > Having looked at a number of (albeit not all) failures, I don't > > think I've seen any sign of a crash like the one above. Do you > > think there are more subtle manifestations of the issue? > > This stack trace was produced by an NMI watchdog timeout, and I thought > OSSTest didn't, but I see I'm wrong. > > In which case this probably isn't want OSSTest is seeing, but it is a > genuine issue. osstest issue IIRC was L1 Xen hitting ASSERT(!sp || (peoi[sp - 1].vector < vector)) in do_IRQ_guest, which seems to mean L0 Xen injects interrupts twice or some such? Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.