[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for Porting Xen to Armv8-R64 - DraftB


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Wei Chen <Wei.Chen@xxxxxxx>
  • Date: Fri, 22 Apr 2022 14:09:05 +0800
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Gu8TCVTwwBYGYf7B9H9RVveLQMMJB0IZehu9UilohK0=; b=Nmx9Ka8UJIFbDa0AKg5FYQRZbUlwA/L7649KeC/kQIEA0IR1VgLAu7yFzzqfLiaVAJ+Iy1nN7JBHfGOb/DdCS2ProbGKACz4zh6cLNsQm2jFO/iHGlbNkNJ4oBhuTo63Mo0eGCKfuUGi1MQ79KLvvcpL4fdYS3hItdl4aNR3hbrjRbbpR6KnhtVNtSXacASRmdQsHZu10FEgVWM/RVsY53W3FD8L6434ENlvXudfzvYQh2HRdCcOlPYDW6C4hk8tX/EESw0rgJJH8i4EnHe4e2aHq2YUNpHD04No6LnHlWs+BjTD1vvx5Acl0w3OL3l/I5MpLBXRmQ9dSRvVwSTmdw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cSZBl703RrcrfmuzgOSk1VYuWINjjWxEp0bk59yzB7npsmO4JPenpBE7do5GsIR/vrQtephJ4aE3t2jHKG0GPZ5YdCEdc/PPq0PwWcsO3Heo/O+eXk5zMJdKzJCcQe6SQ9qyzudmcDtl9OxLJobH3J9ue4s5Sc6ieWK8a9N4vEDs8JkA6NeUY4K6OoZgoBH+Fy2PYZK3R0aeKe3eq94qCPZepdiJd2ulfPIRmmEbdOnyfel3lPnk+pNhKq80ndEkoQJDpUvyT1TGYZGvE9aqxQlSHgqWPvmWoCkEgsYnW0oihbnSP78KGvlm+YUnlkq7bwIQxNJSwI9N5sfVby6b8A==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Penny Zheng <Penny.Zheng@xxxxxxx>
  • Delivery-date: Fri, 22 Apr 2022 06:09:54 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Hi Stefano,

On 2022/4/21 5:08, Stefano Stabellini wrote:
On Wed, 20 Apr 2022, Wei Chen wrote:
On Tue, 19 Apr 2022, Wei Chen wrote:
### 3.2. Xen Event Channel Support
     In Current RFC patches we haven't enabled the event channel
support.
     But I think it's good opportunity to do some discussion in
advanced.
     On Armv8-R, all VMs are native direct-map, because there is no
stage2
     MMU translation. Current event channel implementation depends on
some
     shared pages between Xen and guest: `shared_info` and per-cpu
`vcpu_info`.

     For `shared_info`, in current implementation, Xen will allocate
a
page
     from heap for `shared_info` to store initial meta data. When
guest
is
     trying to setup `shared_info`, it will allocate a free gfn and
use a
     hypercall to setup P2M mapping between gfn and `shared_info`.

     For direct-mapping VM, this will break the direct-mapping
concept.
     And on an MPU based system, like Armv8-R system, this operation
will
     be very unfriendly. Xen need to pop `shared_info` page from Xen
heap
     and insert it to VM P2M pages. If this page is in the middle of
     Xen heap, this means Xen need to split current heap and use
extra
     MPU regions. Also for the P2M part, this page is unlikely to
form
     a new continuous memory region with the existing p2m pages, and
Xen
     is likely to need another additional MPU region to set it up,
which
     is obviously a waste for limited MPU regions. And This kind of
dynamic
     is quite hard to imagine on an MPU system.

Yeah, it doesn't make any sense for MPU systems


     For `vcpu_info`, in current implementation, Xen will store
`vcpu_info`
     meta data for all vCPUs in `shared_info`. When guest is trying
to
setup
     `vcpu_info`, it will allocate memory for `vcpu_info` from guest
side.
     And then guest will use hypercall to copy meta data from
`shared_info`
     to guest page. After that both Xen `vcpu_info` and guest
`vcpu_info`
     are pointed to the same page that allocated by guest.

     This implementation has serval benifits:
     1. There is no waste memory. No extra memory will be allocated
from
Xen heap.
     2. There is no P2M remap. This will not break the direct-mapping,
and
        is MPU system friendly.
     So, on Armv8-R system, we can still keep current implementation
for
     per-cpu `vcpu_info`.

     So, our proposal is that, can we reuse current implementation
idea
of
     `vcpu_info` for `shared_info`? We still allocate one page for
     `d->shared_info` at domain construction for holding some initial
meta-data,
     using alloc_domheap_pages instead of alloc_xenheap_pages and
     share_xen_page_with_guest. And when guest allocates a page for
     `shared_info` and use hypercall to setup it,  We copy the
initial
data from
     `d->shared_info` to it. And after copy we can update `d-
shared_info` to point
     to guest allocated 'shared_info' page. In this case, we don't
have
to think
     about the fragmentation of Xen heap and p2m and the extra MPU
regions.

Yes, I think that would work.

Also I think it should be possible to get rid of the initial
d->shared_info allocation in Xen, given that d->shared_info is for the
benefit of the guest and the guest cannot access it until it makes the
XENMAPSPACE_shared_info hypercall.


While we're working on event channel PoC work on Xen Armv8-R, we found
another issue after we dropped d->shared_info allocation in Xen. Both
shared_info and vcpu_info are allocated from Guest in runtime. That
means the addresses of shared_info and vcpu_info are random. For MMU
system, this is OK, because Xen has a full view of system memory in
runtime. But for MPU system, the situation becomes a little tricky.
We have to setup extra MPU regions for remote domains' shared_info
and vcpu_info in event channel hypercall runtime. That's because
in current Xen hypercall concept, hypercall will not cause vCPU
context switch. When hypercall trap to EL2, it will keep vCPU's
P2M view. For MMU system, we have vttbr_el2 for vCPU P2M view and
ttbr_el2 for Xen view. So in EL2 Xen has full permissions to access
any memory it wants. But for MPU system, we only have one EL2 MPU.
Before entering guest, Xen will setup vCPU P2M view in EL2 MPU.
In this case, when system entry EL2 through hypercall, the EL2
MPU still keeps current vCPU P2M view and with Xen essential
memory (code, data, heap) access permissions. But current EL2 MPU
doesn't have the access permissions for EL2 to access other
domain's memory. For an event channel hypercall, if we want to
update the pending bitmap in remote domain's vcpu_info, it will
cause a dataabort in EL2. To solve this dataabort, we may have
two methods:
1. Map remote domain's whole memory or pages for shared_info +
    vcpu_info in EL2 MPU temporarily for hypercall to update
    pending bits or other accesses.

    This method doesn't need to do context switch for EL2 MPU,
    But this method has some disadvantages:
    1. We have to reserve MPU regions for hypercall.
    2. Different hypercall may have different reservation of
       MPU regions.
    3. We have to handle hypercall one by one for existed and
       new in future.

2. Switch to Xen's memory view in EL2 MPU when trap from EL1 to
    EL2. In this case, Xen will have full memory access permissions
    to update pending bits in EL2. This only changes the EL2 MPU
    context, does not need to do vCPU context switch. Because the
    trapped vCPU will be used in the full flow of hypercall. After
    the hypercall, before returning to EL2, the EL2 MPU will switch
    to scheduled vCPU' P2M view.
    This method needs to do EL2 MPU context switch, but:
    1. We don't need to reserve MPU regions for Xen's memory view.
       (Xen's memory view has been setup while initialization)
    2. We don't need to handle pages' mapping in hypercall level.
    3. Apply to other EL1 to EL2 traps, like dataabort, IRQ, etc.


Both approach 1) and 2) are acceptable and in fact I think we'll
probably have to do a combination of both.

We don't need to do a full MPU context switch every time we enter Xen.
We can be flexible. Only when Xen needs to access another guest memory,
if the memory is not mappable using approach 1), Xen could do a full MPU
context switch. Basically, try 1) first, if it is not possible, do 2).

This also solves the problem of "other hypercalls". We can always do 2)
if we cannot do 1).

So do we need to do 1) at all? It really depends on performance data.
Not all hypercalls are made equal. Some are very rare and it is fine if
they are slow. Some hypercalls are actually on the hot path. The event
channels hypercalls are on the hot path so they need to be fast. It
makes sense to implement 1) just for event channels hypercalls if the
MPU context switch is slow.

Data would help a lot here to make a good decision. Specifically, how
much more expensive is an EL2 MPU context switch compared to add/remove
of an MPU region in nanosec or cpu cycles?


We will do it when we get a proper platform.


The other aspect is how many extra MPU regions do we need for each guest
to implement 1). Do we need one extra MPU region for each domU? If so, I
don't think approach 1) if feasible unless we come up with a smart
memory allocation scheme for shared_info and vcpu_info. For instance, if
shared_info and vcpu_info of all guests were part of the Xen data or
heap region, or 1 other special MPU region, then they could become
immediately accessible without need for extra mappings when switching to
EL2.


Allocate shared_info and vcpu_info from Xen data or heap will cause memory
fragmentation. We have to split the Xen data or heap and populate the pages
for shared_info and vcpu_info, And insert them to Guest P2M. Because Armv8-R
MPU doesn't allow memory overlap, this will cause at least 2 extra MPU
regions usage. One page could not exist in Xen MPU region and Guest P2M
MPU region at the same time. And we definitely don't want to make the entire
Xen data and heap accessible to EL1. And this approach does not solve the
100% direct mapping problem. A special MPU region might have the same issues.
Except we make this special MPU region can be accessed in EL1 and EL2 at
runtime (it's unsafe), and update hypercall to use pages from this special
region for shared_info and vcpu_info (every guest can see this region, so
it's still 1:1 mapping).

For 1), the concern is caused by our current rough PoC, we used extra MPU
regions to map the whole memory of remote domain, whose may have serval
memory blocks in the worst case. We have thought it further, we can reduce
the map granularity to page. For example, Xen wants to update shared_info
or vcpu_info, Xen must know the address of it. So we can just map this
one page temporarily. So I think only reserve 1 MPU region for runtime
mapping is feasible on most platforms.

Actually I think that it would be great if we can do that. It looks like
the best way forward.


But the additional problem with this is that if the hypercall are
modifying multiple variables, Xen may need to do multiple mappings if
they are not on the same page (or a proper MPU region range).

There are not that many hypercalls that require Xen to map multiple
pages, and those might be OK if they are slow.

Ok, I will update it in Draft-C.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.