Xen project Mailing List

Re: dom0 PV looping on search_pre_exception_table()

To: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 9 Dec 2020 13:28:54 +0000

Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Delivery-date: Wed, 09 Dec 2020 13:29:09 +0000

Ironport-hdrordr: A9a23:49WMsqnKKjpR4sHbKdZUgvgZ3JDpDfOLimdD5ilNYBxZY6Wkvu qF9c516TbfjjENVHY83fWJP6edSX3RnKQFh7U5F7GkQQXgpS+UPJhvhLGSpAHINiXi+odmpM RdWodkDtmYNzlHpOb8pDK1CtMxhOSAmZrY4dv261dIYUVUZ7p77wF/Yzzrd3FeYAVdH5I2GN 69y6N8xwaIQngcYsSlCnRtZYGqzLf2vanrbhIcCxks5BPmt12VwYX3DgSC2VMmWy5PqI1SiF TtqRDz5amorpiApyP06mm71fhrsef6xsAGLMKBjdV9EESPti+YIK5lW7GEoQkvpvCu5FsAgL D30mwdFvU2zWjQcGGzqQbs3Ael8A9G0Q6b9WOwhHHUocHpLQ8SAcApv+1kWxHe7Fctu8w51a pN0X6QuZY/N2KnoA324d/UWxZ20leluHZKq591s1VTWZYTAYUhzrA381hSFP47fR7S6IdiC+ V2CdGZ+fA+SyL/U1ncsnN0yNKhGnQ/dy3nfmEHusiYlydbh2p4yUxw/r17ol4a+JgwS4ZJ6o 3/W8wC/o1mVcMYYblwA+0MW6KMZFDlWh7QLHmUZU3uCaBvAQO1l7fs/L436Ou2EaZk8LIunv 36PG9wqXQ/YAbnB8GIwfRwg3LwaXT4VzHsxsZC/oN+q73xSbH6WBfzM2wGgo+nuPUQAsrSRv a1NtZXGpbYXBPTJbo=

Ironport-sdr: OAq9Zg151lthr0GY17a0abyNpoE/3JWRV9aUlX+BOdYJjaIpfpqb9T/1n57qJqdMX1k+VKhVIF iduC4APXAgh26ynvLO1QCjcY4LXbclhYtRKLtFZAnwgtK6qh1pX3hA7+TDdf3nwrqvV/gma0L+ aKNktzrcyC4fVclmEfZHA+EXNK6Y3o6srqKM77PoJVS9Ppm3N018m+4XldoX/CUOfBJwa8JyLv dS67OWKTT1d0IOenttv3esfOhjeEaX87N4WQf0BZn9n/BS7ynYy6wsmtuVsqZ/5c0gf6e8a66P 8JI=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09/12/2020 10:15, Manuel Bouyer wrote: > On Tue, Dec 08, 2020 at 06:13:46PM +0000, Andrew Cooper wrote: >> On 08/12/2020 17:57, Manuel Bouyer wrote: >>> Hello, >>> for the first time I tried to boot a xen kernel from devel with >>> a NetBSD PV dom0. The kernel boots, but when the first userland prcess >>> is launched, it seems to enter a loop involving search_pre_exception_table() >>> (I see an endless stream from the dprintk() at arch/x86/extable.c:202) >>> >>> With xen 4.13 I see it, but exactly once: >>> (XEN) extable.c:202: Pre-exception: ffff82d08038c304 -> ffff82d08038c8c8 >>> >>> with devel: >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> (XEN) extable.c:202: Pre-exception: ffff82d040393309 -> ffff82d0403938c8 >>> >>> [...] >>> >>> the dom0 kernel is the same. >>> >>> At first glance it looks like a fault in the guest is not handled at it >>> should, >>> and the userland process keeps faulting on the same address. >>> >>> Any idea what to look at ? >> That is a reoccurring fault on IRET back to guest context, and is >> probably caused by some unwise-in-hindsight cleanup which doesn't >> escalate the failure to the failsafe callback. >> >> This ought to give something useful to debug with: > thanks, I got: > (XEN) IRET fault: #PF[0000] > (XEN) domain_crash called from extable.c:209 > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.15-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: 0047:[<00007f7e184007d0>] > (XEN) RFLAGS: 0000000000000202 EM: 0 CONTEXT: pv guest (d0v0) > (XEN) rax: ffff82d04038c309 rbx: 0000000000000000 rcx: 000000000000e008 > (XEN) rdx: 0000000000010086 rsi: ffff83007fcb7f78 rdi: 000000000000e010 > (XEN) rbp: 0000000000000000 rsp: 00007f7fff53e3e0 r8: 0000000e00000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000002660 > (XEN) cr3: 0000000079cdb000 cr2: 00007f7fff53e3e0 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: ffffffff80cf2dc0 > > (XEN) ds: 0023 es: 0023 fs: 0000 gs: 0000 ss: 003f cs: 0047 > (XEN) Guest stack trace from rsp=00007f7fff53e3e0: > (XEN) 0000000000000001 00007f7fff53e8f8 0000000000000000 0000000000000000 > (XEN) 0000000000000003 000000004b600040 0000000000000004 0000000000000038 > (XEN) 0000000000000005 0000000000000008 0000000000000006 0000000000001000 > (XEN) 0000000000000007 00007f7e18400000 0000000000000008 0000000000000000 > (XEN) 0000000000000009 000000004b601cd0 00000000000007d0 0000000000000000 > (XEN) 00000000000007d1 0000000000000000 00000000000007d2 0000000000000000 > (XEN) 00000000000007d3 0000000000000000 000000000000000d 00007f7fff53f000 > (XEN) 00000000000007de 00007f7fff53e4e0 0000000000000000 0000000000000000 > (XEN) 6e692f6e6962732f 0000000000007469 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. Pagefaults on IRET come either from stack accesses for operands (not the case here as Xen is otherwise working fine), or from segement selector loads for %cs and %ss. In this example, %ss is in the LDT, which specifically does use pagefaults to promote the frame to PGT_segdesc. I suspect that what is happening is that handle_ldt_mapping_fault() is failing to promote the page (for some reason), and we're taking the "In hypervisor mode? Leave it to the #PF handler to fix up." path due to the confusion in context, and Xen's #PF handler is concluding "nothing else to do". The older behaviour of escalating to the failsafe callback would have broken this cycle by rewriting %ss and re-entering the kernel. Please try the attached debugging patch, which is an extension of what I gave you yesterday. First, it ought to print %cr2, which I expect will point to Xen's virtual mapping of the vcpu's LDT. The logic ought to loop a few times so we can inspect the hypervisor codepaths which are effectively livelocked in this state, and I've also instrumented check_descriptor() failures because I've got a gut feeling that is the root cause of the problem. ~Andrew

Attachment: 0001-extable-dbg.patch
Description: Text Data

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.