[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Error during update_runstate_area with KPTI activated
- To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
- From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
- Date: Thu, 14 May 2020 14:28:12 +0000
- Accept-language: en-GB, en-US
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2BZVsjf+c438uDvCdiV4xwfZF+wKN8GYiL936qBJPPI=; b=mZkg2iRdHQJG1rcQLcz5HYKkCPELi/Gf9dETmDqkIgE+aiXRcv0IqmABB6FrHKnrruBbSNdG3T+3QeebuV+G7f7kohghJwtofBKgKzNtWHJTuhtvMHgPY9QqvvmCgi31RTvP3yk0r0HdDJ3yihJYvaanJnV4sa+u61kD7R7C+LfPqBEJyRzX22lXPf8w15Zy2YPYf+m+cWkaT48C4umDS4h2GfO33vf5PsHm3/L3lJQKLBixjNHjmbSXK1KUIDVCoN5zzjIWZvqeeI2zIQ6z3C1mbZVhEizy7+YjFfk9UB7UvKH3IHLxgmahRPu6xhHVspYGk3bkLT09Xoyw676AYQ==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=j+l8XD0dKrkplkqaxyZYOY1onvoVcj0IwCPbQFyEKVdyjUAmMcoAerqoUn5M31Q5RFnO10OZC400MzIhQ0F6zCJ8arVfFBCxk9889mqOR7tcjpCeWTwKwyfAkiiMF8DMdBdqpAIrE0l8NLYpE8Cj1c0AmwneM42lEmN3bu+IGqGmMNqJbq/RVDymSoGkB5ywqVpWCPj6HFymm6gxYvMA1UFb1ITaCiIYxFubSw295mz7ENf+EQYA+6Fh8yL86cL3q6fwXNN6Wbz3YhGjY3C3LI3x6UFkBPQgAQ/t0CrJNv9tnrbvwjXw2z25oc03/QNCGDT3q+whYCZ7Cvp9lZOsCQ==
- Authentication-results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; lists.xenproject.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;lists.xenproject.org; dmarc=bestguesspass action=none header.from=arm.com;
- Authentication-results-original: lists.xenproject.org; dkim=none (message not signed) header.d=none; lists.xenproject.org; dmarc=none action=none header.from=arm.com;
- Cc: nd <nd@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>
- Delivery-date: Thu, 14 May 2020 14:29:30 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
- Nodisclaimer: true
- Original-authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=arm.com;
- Thread-index: AQHWKfvm/FV2s2q8bUucgMUz0AtbkA==
- Thread-topic: Error during update_runstate_area with KPTI activated
Hi,
When executing linux on arm64 with KPTI activated (in Dom0 or in a DomU), I
have a lot of walk page table errors like this:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
After implementing a call trace, I found that the problem was coming from the
update_runstate_area when linux has KPTI activated.
I have the following call trace:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
(XEN) backtrace.c:29: Stacktrace start at 0x8007638efbb0 depth 10
(XEN) [<000000000027780c>] get_page_from_gva+0x180/0x35c
(XEN) [<00000000002700c8>] guestcopy.c#copy_guest+0x1b0/0x2e4
(XEN) [<0000000000270228>] raw_copy_to_guest+0x2c/0x34
(XEN) [<0000000000268dd0>] domain.c#update_runstate_area+0x90/0xc8
(XEN) [<000000000026909c>] domain.c#schedule_tail+0x294/0x2d8
(XEN) [<0000000000269524>] context_switch+0x58/0x70
(XEN) [<00000000002479c4>] core.c#sched_context_switch+0x88/0x1e4
(XEN) [<000000000024845c>] core.c#schedule+0x224/0x2ec
(XEN) [<0000000000224018>] softirq.c#__do_softirq+0xe4/0x128
(XEN) [<00000000002240d4>] do_softirq+0x14/0x1c
Discussing this subject with Stefano, he pointed me to a discussion started a
year ago on this subject here:
https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03053.html
And a patch was submitted:
https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg02320.html
I rebased this patch on current master and it is solving the problem I have
seen.
It sounds to me like a good solution to introduce a
VCPUOP_register_runstate_phys_memory_area to not depend on the area actually
being mapped in the guest when a context switch is being done (which is
actually the problem happening when a context switch is trigger while a guest
is running in EL0).
Is there any reason why this was not merged at the end ?
Thanks
Bertrand
|