[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 1/1] xen: Use a global mapping for runstate
Hi Julien, > On 28 May 2020, at 19:54, Julien Grall <julien@xxxxxxx> wrote: > > Hi Bertrand, > > Thank you for the patch. > > On 28/05/2020 16:25, Bertrand Marquis wrote: >> At the moment on Arm, a Linux guest running with KTPI enabled will >> cause the following error when a context switch happens in user mode: >> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 >> This patch is modifying runstate handling to map the area given by the >> guest inside Xen during the hypercall. >> This is removing the guest virtual to physical conversion during context >> switches which removes the bug > > It would be good to spell out that a virtual address is not stable. So > relying on it is wrong. > >> and improve performance by preventing to >> walk page tables during context switches. > > With Secret free hypervisor in mind, I would like to suggest to map/unmap the > runstate during context switch. > > The cost should be minimal when there is a direct map (i.e on Arm64 and x86) > and still provide better performance on Arm32. Even with a minimal cost this is still adding some non real-time behaviour to the context switch. But definitely from the security point of view as we have to map a page from the guest, we could have accessible in Xen some data that should not be there. There is a trade here where: - xen can protect by map/unmapping - a guest which wants to secure his data should either not use it or make sure there is nothing else in the page That sounds like a thread local storage kind of problematic where we want data from xen to be accessible fast from the guest and easy to be modified from xen. > > The change should be minimal compare to the current approach but this could > be taken care separately if you don't have time. I could add that to the serie in a separate patch so that it can be discussed and test separately ? > >> -- >> In the current status, this patch is only working on Arm and needs to >> be fixed on X86 (see #error on domain.c for missing get_page_from_gva). >> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx> >> --- >> xen/arch/arm/domain.c | 32 +++++++++------- >> xen/arch/x86/domain.c | 51 ++++++++++++++----------- >> xen/common/domain.c | 84 ++++++++++++++++++++++++++++++++++------- >> xen/include/xen/sched.h | 11 ++++-- >> 4 files changed, 124 insertions(+), 54 deletions(-) >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index 31169326b2..799b0e0103 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -278,33 +278,37 @@ static void ctxt_switch_to(struct vcpu *n) >> /* Update per-VCPU guest runstate shared memory area (if registered). */ >> static void update_runstate_area(struct vcpu *v) >> { >> - void __user *guest_handle = NULL; >> - struct vcpu_runstate_info runstate; >> + struct vcpu_runstate_info *runstate; >> - if ( guest_handle_is_null(runstate_guest(v)) ) >> + /* XXX why do we accept not to block here */ >> + if ( !spin_trylock(&v->runstate_guest_lock) ) >> return; >> - memcpy(&runstate, &v->runstate, sizeof(runstate)); >> + runstate = runstate_guest(v); >> + >> + if (runstate == NULL) >> + { >> + spin_unlock(&v->runstate_guest_lock); >> + return; >> + } >> if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> { >> - guest_handle = &v->runstate_guest.p->state_entry_time + 1; >> - guest_handle--; >> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, >> 1); >> + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >> smp_wmb(); > > Because you set v->runstate.state_entry_time below, the placement of the > barrier seems a bit odd. > > I would suggest to update v->runstate.state_entry_time first and then update > runstate->state_entry_time. We do want the guest to know when we modify the runstate so we need to make sure the XEN_RUNSTATE_UPDATE is actually set in a visible way before we do the memcpy. That’s why the barrier is after. > >> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> } >> - __copy_to_guest(runstate_guest(v), &runstate, 1); >> + memcpy(runstate, &v->runstate, sizeof(v->runstate)); >> - if ( guest_handle ) >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> { >> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> smp_wmb(); > > You want to update runstate->state_entry_time after the barrier not before. Agree > >> static void _update_runstate_area(struct vcpu *v) >> { >> + /* XXX: this should be removed */ >> if ( !update_runstate_area(v) && is_pv_vcpu(v) && >> !(v->arch.flags & TF_kernel_mode) ) >> v->arch.pv.need_update_runstate_area = 1; >> diff --git a/xen/common/domain.c b/xen/common/domain.c >> index 7cc9526139..acc6f90ba3 100644 >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -161,6 +161,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int >> vcpu_id) >> v->dirty_cpu = VCPU_CPU_CLEAN; >> spin_lock_init(&v->virq_lock); >> + spin_lock_init(&v->runstate_guest_lock); >> tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL); >> @@ -691,6 +692,66 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, >> struct domain **d) >> return 0; >> } >> +static void unmap_runstate_area(struct vcpu *v, unsigned int lock) > > NIT: There is an extra space after void > > Also, AFAICT, the lock is only taking two values. Please switch to a bool. Agree > >> +{ >> + mfn_t mfn; >> + >> + if ( ! runstate_guest(v) ) > > NIT: We don't usually put a space after !. > > But shouldn't this be checked within the lock? Agree > > >> + return; >> + >> + if (lock) > > NIT: if ( ... ) > Ack >> + spin_lock(&v->runstate_guest_lock); >> + >> + mfn = domain_page_map_to_mfn(runstate_guest(v)); >> + >> + unmap_domain_page_global((void *) >> + ((unsigned long)v->runstate_guest & >> + PAGE_MASK)); >> + >> + put_page_and_type(mfn_to_page(mfn)); >> + runstate_guest(v) = NULL; >> + >> + if (lock) > > Ditto. Ack > >> + spin_unlock(&v->runstate_guest_lock); >> +} >> + >> +static int map_runstate_area(struct vcpu *v, >> + struct vcpu_register_runstate_memory_area *area) >> +{ >> + unsigned long offset = area->addr.p & ~PAGE_MASK; >> + void *mapping; >> + struct page_info *page; >> + size_t size = sizeof(struct vcpu_runstate_info); >> + >> + ASSERT(runstate_guest(v) == NULL); >> + >> + /* do not allow an area crossing 2 pages */ >> + if ( offset > (PAGE_SIZE - size) ) >> + return -EINVAL; > > This is a change in behavior for the guest. If we are going forward with > this, this will want a separate patch with its own explanation why this is > done. Ack I need to add support for an area crossing pages > >> + >> +#ifdef CONFIG_ARM >> + page = get_page_from_gva(v, area->addr.p, GV2M_WRITE); > > A guest is allowed to setup the runstate for a different vCPU than the > current one. This will lead to get_page_from_gva() to fail as the function > cannot yet work with a vCPU other than current. If the area is mapped per cpu but isn’t the aim of this to have a way to check other cpus status ? > > AFAICT, there is no restriction on when the runstate hypercall can be called. > So this can even be called before the vCPU is brought up. I understand the remark but it still feels very weird to allow an invalid address in an hypercall. Wouldn’t we have a lot of potential issues accepting an address that we cannot check ? > > I was going to suggest to use the current vCPU for translating the address. > However, it would be reasonable for an OS to use the same virtual address for > all the vCPUs assuming the page-tables are different per vCPU. > > Recent Linux are using a per-cpu area, so the virtual address should be > different for each vCPU. But I don't know how the other OSes works. Roger > should be able to help for FreeBSD at least. > > I have CCed Paul for the Windows drivers. > > If we decide to introduce some restriction then they should be explained in > the commit message and also documented in the public header (we have been > pretty bad at documenting change in the past!). Agree Cheers Bertrand
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |