[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Error during update_runstate_area with KPTI activated
Hi, On 15/05/2020 09:38, Roger Pau Monné wrote: On Fri, May 15, 2020 at 07:39:16AM +0000, Bertrand Marquis wrote:On 14 May 2020, at 20:13, Julien Grall <julien.grall.oss@xxxxxxxxx<mailto:julien.grall.oss@xxxxxxxxx>> wrote: On Thu, 14 May 2020 at 19:12, Andrew Cooper <andrew.cooper3@xxxxxxxxxx<mailto:andrew.cooper3@xxxxxxxxxx>> wrote: On 14/05/2020 18:38, Julien Grall wrote: Hi, On 14/05/2020 17:18, Bertrand Marquis wrote: On 14 May 2020, at 16:57, Julien Grall <julien@xxxxxxx<mailto:julien@xxxxxxx>> wrote: On 14/05/2020 15:28, Bertrand Marquis wrote: Hi, Hi, When executing linux on arm64 with KPTI activated (in Dom0 or in a DomU), I have a lot of walk page table errors like this: (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 After implementing a call trace, I found that the problem was coming from the update_runstate_area when linux has KPTI activated. I have the following call trace: (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 (XEN) backtrace.c:29: Stacktrace start at 0x8007638efbb0 depth 10 (XEN) [<000000000027780c>] get_page_from_gva+0x180/0x35c (XEN) [<00000000002700c8>] guestcopy.c#copy_guest+0x1b0/0x2e4 (XEN) [<0000000000270228>] raw_copy_to_guest+0x2c/0x34 (XEN) [<0000000000268dd0>] domain.c#update_runstate_area+0x90/0xc8 (XEN) [<000000000026909c>] domain.c#schedule_tail+0x294/0x2d8 (XEN) [<0000000000269524>] context_switch+0x58/0x70 (XEN) [<00000000002479c4>] core.c#sched_context_switch+0x88/0x1e4 (XEN) [<000000000024845c>] core.c#schedule+0x224/0x2ec (XEN) [<0000000000224018>] softirq.c#__do_softirq+0xe4/0x128 (XEN) [<00000000002240d4>] do_softirq+0x14/0x1c Discussing this subject with Stefano, he pointed me to a discussion started a year ago on this subject here: https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03053.html And a patch was submitted: https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg02320.html I rebased this patch on current master and it is solving the problem I have seen. It sounds to me like a good solution to introduce a VCPUOP_register_runstate_phys_memory_area to not depend on the area actually being mapped in the guest when a context switch is being done (which is actually the problem happening when a context switch is trigger while a guest is running in EL0). Is there any reason why this was not merged at the end ? I just skimmed through the thread to remind myself the state. AFAICT, this is blocked on the contributor to clarify the intended interaction and provide a new version. What do you mean here by intended interaction ? How the new hyper call should be used by the guest OS ? From what I remember, Jan was seeking clarification on whether the two hypercalls (existing and new) can be called together by the same OS (and make sense). There was also the question of the handover between two pieces of sotfware. For instance, what if the firmware is using the existing interface but the OS the new one? Similar question about Kexecing a different kernel. This part is mostly documentation so we can discuss about the approach and review the implementation. I am still in favor of the new hypercall (and still in my todo list) but I haven't yet found time to revive the series. Would you be willing to take over the series? I would be happy to bring you up to speed and provide review. Sure I can take it over. I ported it to master version of xen and I tested it on a board. I still need to do a deep review of the code myself but I have an understanding of the problem and what is the idea. Any help to get on speed would be more then welcome :-) I would recommend to go through the latest version (v3) and the previous (v2). I am also suggesting v2 because I think the split was easier to review/understand. The x86 code is probably what is going to give you the most trouble as there are two ABIs to support (compat and non-compat). If you don't have an x86 setup, I should be able to test it/help write it. Feel free to ask any questions and I will try my best to remember the discussion from last year :). At risk of being shouted down again, a new hypercall isn't necessarily necessary, and there are probably better ways of fixing it. The underlying ABI problem is that the area is registered by virtual address. The only correct way this should have been done is to register by guest physical address, so Xen's updating of the data doesn't interact with the guest pagetable settings/restrictions. x86 suffers the same kind of problems as ARM, except we silently squash the fallout. The logic in Xen is horrible, and I would really rather it was deleted completely, rather than to be kept for compatibility. The runstate area is always fixed kernel memory and doesn't move. I believe it is already restricted from crossing a page boundary, and we can calculate the va=>pa translation when the hypercall is made. Yes - this is a technically ABI change, but nothing is going to break (AFAICT) and the cleanup win is large enough to make this a *very* attractive option. I suggested this approach two years ago [1] but you were the one saying that buffer could cross page-boundary on older Linux [2]: "I'd love to do this, but we cant. Older Linux used to have a virtual buffer spanning a page boundary. Changing the behaviour under that will cause older setups to explode."Sorry this was long time ago, and details have faded. IIRC there was even a proposal (or patch set) that took that into account and allowed buffers to span across a page boundary by taking a reference to two different pages in that case. I am not aware of a patch set. Juergen suggested a per-domain mapping but there was no details how this could be done (my e-mail was left unanswered [1]). If we were using the vmap() then we would need up 1MB per domain (assuming 128 vCPUs). This sounds quite a bit and I think we need to agree whether it would be an acceptable solution (this was also left unanswered [1]). Another option would be to just return -EINVAL or -EOPNOTSUPP in that case and just get on with it. runstate info shouldn't be mandatory for guests to function properly, I would say it's just extra info that's provided in good faith from the hypervisor in order to help the guest make better scheduling decisions. Linux will panic if the VCPUOP_register_runstate_memory_area returns an error (see xen_setup_runstate_info()). So can you explain your change of heart here? I would prefer to fix it like this, (perhaps adding a new hypercall which explicitly takes a guest physical address), than to keep any of this mess around forever more to cope with legacy guests. What does legacy guests mean? Is it PV 32-bit or does it also include some HVM? Reading all this and digging into the code, the meaning full implementation would definitely be to validate and translate the address when during the hypercall handling and then to just reuse this address along the way. Wether or not the guest is passing a virtual address (versus an intermediate physical) and creating a new hyper call for this might be a different question that we could handle separatly. Does anyone see something wrong with such an approach ? Answer myself: There might be the corner case where the physical area is actually can be removed from the guest (ie a guest using some memory coming from a temporary mapped area). Would there be a way to check that during the hypercall ?You have to take a reference to the page in order to prevent it from being freed under your fit. That way if the guest decides to balloon out the page you would prevent it from being freed by having taken that reference. Roger. Cheers, [1] <fb92072f-2709-fa5a-0284-08a66c401049@xxxxxxx> -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |