[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 1/1] xen: Use a global mapping for runstate


  • To: Julien Grall <julien@xxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Fri, 29 May 2020 08:13:58 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=59hoZBAK2rB0b8gl3WTDIfJ4A9zFDcZO3/xCm454M2A=; b=FAxb3Zi2FLgNAeO5nCtKjz7Qvynsgp59ilF5rNIcKLM4GaqczcWvawFfzJ/qO+itOGhZd/Drs4PNtEXRJJbKgJNCpWYyF8mzlOsq76fmPMcr0mJTrhjtiaw8R9xFFPN98nnIoCvNJj0MBE3YiEhu1iiXbP5EwkGtMewnzdeWMVqWih0sT9f/7gTlICLJSBmCPkDa4EdSFrigpbvtu61gXur0VuafXR/a/ZZUJdoXHtCBe7I64VWYoBd8BU4tb6Y6lJqALFCo+nh/adKYmt9nWjpSqYy8Tgi1rVBXQoMzhePqRbnQa+uzIXFTKNOIR7LNQbhaVfLncy2cJbE1aQ36nA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=m3LS4vsRObt0zMIgde9IaM45mu8NWlAo4K2u2Tu7i5AGXHGcVgIkswobXF14oTorDPVVTrHX2GuKpuYCZXD5h2VjP8NaWpCEhuxFxTFdMPol5DP4O7SnVtPPSmPwrOS9B2P9OSzVLdwJkm/LdCgno5ZD8DxM/J9Mtt+Yq1C4uleMJgeJuW9zTAm7ACHsd/QPwFX6J+72n/SzQ8IJmfyNhtk3Sb8WcC0WJCHr/sx517CLxi/3XxATbHw+8PBaxyul9xF27D6eANnNriK7DyCXTHQ86ylKHl5coHEqm3oKSL289aWImRKeWPOme/qrZrLzSxF/MgdIk5AEEyr4ql+8Ug==
  • Authentication-results-original: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, "Xia, Hongyan" <hongyxia@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 29 May 2020 08:14:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHWNP6nIxIB/r3XC025DLnqIbmaJai92N6AgADfWQA=
  • Thread-topic: [RFC PATCH 1/1] xen: Use a global mapping for runstate

Hi Julien,

> On 28 May 2020, at 19:54, Julien Grall <julien@xxxxxxx> wrote:
> 
> Hi Bertrand,
> 
> Thank you for the patch.
> 
> On 28/05/2020 16:25, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>> This patch is modifying runstate handling to map the area given by the
>> guest inside Xen during the hypercall.
>> This is removing the guest virtual to physical conversion during context
>> switches which removes the bug
> 
> It would be good to spell out that a virtual address is not stable. So 
> relying on it is wrong.
> 
>> and improve performance by preventing to
>> walk page tables during context switches.
> 
> With Secret free hypervisor in mind, I would like to suggest to map/unmap the 
> runstate during context switch.
> 
> The cost should be minimal when there is a direct map (i.e on Arm64 and x86) 
> and still provide better performance on Arm32.

Even with a minimal cost this is still adding some non real-time behaviour to 
the context switch.
But definitely from the security point of view as we have to map a page from 
the guest, we could have accessible in Xen some data that should not be there.
There is a trade here where:
- xen can protect by map/unmapping
- a guest which wants to secure his data should either not use it or make sure 
there is nothing else in the page

That sounds like a thread local storage kind of problematic where we want data 
from xen to be accessible fast from the guest and easy to be modified from xen.

> 
> The change should be minimal compare to the current approach but this could 
> be taken care separately if you don't have time.

I could add that to the serie in a separate patch so that it can be discussed 
and test separately ?

> 
>> --
>> In the current status, this patch is only working on Arm and needs to
>> be fixed on X86 (see #error on domain.c for missing get_page_from_gva).
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx>
>> ---
>>  xen/arch/arm/domain.c   | 32 +++++++++-------
>>  xen/arch/x86/domain.c   | 51 ++++++++++++++-----------
>>  xen/common/domain.c     | 84 ++++++++++++++++++++++++++++++++++-------
>>  xen/include/xen/sched.h | 11 ++++--
>>  4 files changed, 124 insertions(+), 54 deletions(-)
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 31169326b2..799b0e0103 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -278,33 +278,37 @@ static void ctxt_switch_to(struct vcpu *n)
>>  /* Update per-VCPU guest runstate shared memory area (if registered). */
>>  static void update_runstate_area(struct vcpu *v)
>>  {
>> -    void __user *guest_handle = NULL;
>> -    struct vcpu_runstate_info runstate;
>> +    struct vcpu_runstate_info *runstate;
>>  -    if ( guest_handle_is_null(runstate_guest(v)) )
>> +    /* XXX why do we accept not to block here */
>> +    if ( !spin_trylock(&v->runstate_guest_lock) )
>>          return;
>>  -    memcpy(&runstate, &v->runstate, sizeof(runstate));
>> +    runstate = runstate_guest(v);
>> +
>> +    if (runstate == NULL)
>> +    {
>> +        spin_unlock(&v->runstate_guest_lock);
>> +        return;
>> +    }
>>        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>      {
>> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>> -        guest_handle--;
>> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 
>> 1);
>> +        runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>          smp_wmb();
> 
> Because you set v->runstate.state_entry_time below, the placement of the 
> barrier seems a bit odd.
> 
> I would suggest to update v->runstate.state_entry_time first and then update 
> runstate->state_entry_time.

We do want the guest to know when we modify the runstate so we need to make 
sure the XEN_RUNSTATE_UPDATE is actually set in a visible way before we do the 
memcpy.
That’s why the barrier is after.

> 
>> +        v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>      }
>>  -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    memcpy(runstate, &v->runstate, sizeof(v->runstate));
>>  -    if ( guest_handle )
>> +    if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>      {
>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +        runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>          smp_wmb();
> 
> You want to update runstate->state_entry_time after the barrier not before.
Agree

> 
>>  static void _update_runstate_area(struct vcpu *v)
>>  {
>> +    /* XXX: this should be removed */
>>      if ( !update_runstate_area(v) && is_pv_vcpu(v) &&
>>           !(v->arch.flags & TF_kernel_mode) )
>>          v->arch.pv.need_update_runstate_area = 1;
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 7cc9526139..acc6f90ba3 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -161,6 +161,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
>> vcpu_id)
>>      v->dirty_cpu = VCPU_CPU_CLEAN;
>>        spin_lock_init(&v->virq_lock);
>> +    spin_lock_init(&v->runstate_guest_lock);
>>        tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL);
>>  @@ -691,6 +692,66 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, 
>> struct domain **d)
>>      return 0;
>>  }
>>  +static void  unmap_runstate_area(struct vcpu *v, unsigned int lock)
> 
> NIT: There is an extra space after void
> 
> Also, AFAICT, the lock is only taking two values. Please switch to a bool.

Agree

> 
>> +{
>> +    mfn_t mfn;
>> +
>> +    if ( ! runstate_guest(v) )
> 
> NIT: We don't usually put a space after !.
> 
> But shouldn't this be checked within the lock?

Agree

> 
> 
>> +        return;
>> +
>> +    if (lock)
> 
> NIT: if ( ... )
> 

Ack

>> +        spin_lock(&v->runstate_guest_lock);
>> +
>> +    mfn = domain_page_map_to_mfn(runstate_guest(v));
>> +
>> +    unmap_domain_page_global((void *)
>> +                            ((unsigned long)v->runstate_guest &
>> +                             PAGE_MASK));
>> +
>> +    put_page_and_type(mfn_to_page(mfn));
>> +    runstate_guest(v) = NULL;
>> +
>> +    if (lock)
> 
> Ditto.

Ack

> 
>> +        spin_unlock(&v->runstate_guest_lock);
>> +}
>> +
>> +static int map_runstate_area(struct vcpu *v,
>> +                    struct vcpu_register_runstate_memory_area *area)
>> +{
>> +    unsigned long offset = area->addr.p & ~PAGE_MASK;
>> +    void *mapping;
>> +    struct page_info *page;
>> +    size_t size = sizeof(struct vcpu_runstate_info);
>> +
>> +    ASSERT(runstate_guest(v) == NULL);
>> +
>> +    /* do not allow an area crossing 2 pages */
>> +    if ( offset > (PAGE_SIZE - size) )
>> +        return -EINVAL;
> 
> This is a change in behavior for the guest. If we are going forward with 
> this, this will want a separate patch with its own explanation why this is 
> done.

Ack I need to add support for an area crossing pages

> 
>> +
>> +#ifdef CONFIG_ARM
>> +    page = get_page_from_gva(v, area->addr.p, GV2M_WRITE);
> 
> A guest is allowed to setup the runstate for a different vCPU than the 
> current one. This will lead to get_page_from_gva() to fail as the function 
> cannot yet work with a vCPU other than current.

If the area is mapped per cpu but isn’t the aim of this to have a way to check 
other cpus status ?

> 
> AFAICT, there is no restriction on when the runstate hypercall can be called. 
> So this can even be called before the vCPU is brought up.

I understand the remark but it still feels very weird to allow an invalid 
address in an hypercall.
Wouldn’t we have a lot of potential issues accepting an address that we cannot 
check ?


> 
> I was going to suggest to use the current vCPU for translating the address. 
> However, it would be reasonable for an OS to use the same virtual address for 
> all the vCPUs assuming the page-tables are different per vCPU.
> 
> Recent Linux are using a per-cpu area, so the virtual address should be 
> different for each vCPU. But I don't know how the other OSes works. Roger 
> should be able to help for FreeBSD at least.
> 
> I have CCed Paul for the Windows drivers.
> 
> If we decide to introduce some restriction then they should be explained in 
> the commit message and also documented in the public header (we have been 
> pretty bad at documenting change in the past!).

Agree

Cheers
Bertrand


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.