[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Proposal for Porting Xen to Armv8-R64 - DraftB



On Wed, 20 Apr 2022, Wei Chen wrote:
> > On Tue, 19 Apr 2022, Wei Chen wrote:
> > > > > ### 3.2. Xen Event Channel Support
> > > > >     In Current RFC patches we haven't enabled the event channel
> > support.
> > > > >     But I think it's good opportunity to do some discussion in
> > advanced.
> > > > >     On Armv8-R, all VMs are native direct-map, because there is no
> > > > stage2
> > > > >     MMU translation. Current event channel implementation depends on
> > > > some
> > > > >     shared pages between Xen and guest: `shared_info` and per-cpu
> > > > `vcpu_info`.
> > > > >
> > > > >     For `shared_info`, in current implementation, Xen will allocate
> > a
> > > > page
> > > > >     from heap for `shared_info` to store initial meta data. When
> > guest
> > > > is
> > > > >     trying to setup `shared_info`, it will allocate a free gfn and
> > use a
> > > > >     hypercall to setup P2M mapping between gfn and `shared_info`.
> > > > >
> > > > >     For direct-mapping VM, this will break the direct-mapping
> > concept.
> > > > >     And on an MPU based system, like Armv8-R system, this operation
> > will
> > > > >     be very unfriendly. Xen need to pop `shared_info` page from Xen
> > heap
> > > > >     and insert it to VM P2M pages. If this page is in the middle of
> > > > >     Xen heap, this means Xen need to split current heap and use
> > extra
> > > > >     MPU regions. Also for the P2M part, this page is unlikely to
> > form
> > > > >     a new continuous memory region with the existing p2m pages, and
> > Xen
> > > > >     is likely to need another additional MPU region to set it up,
> > which
> > > > >     is obviously a waste for limited MPU regions. And This kind of
> > > > dynamic
> > > > >     is quite hard to imagine on an MPU system.
> > > >
> > > > Yeah, it doesn't make any sense for MPU systems
> > > >
> > > >
> > > > >     For `vcpu_info`, in current implementation, Xen will store
> > > > `vcpu_info`
> > > > >     meta data for all vCPUs in `shared_info`. When guest is trying
> > to
> > > > setup
> > > > >     `vcpu_info`, it will allocate memory for `vcpu_info` from guest
> > side.
> > > > >     And then guest will use hypercall to copy meta data from
> > > > `shared_info`
> > > > >     to guest page. After that both Xen `vcpu_info` and guest
> > `vcpu_info`
> > > > >     are pointed to the same page that allocated by guest.
> > > > >
> > > > >     This implementation has serval benifits:
> > > > >     1. There is no waste memory. No extra memory will be allocated
> > from
> > > > Xen heap.
> > > > >     2. There is no P2M remap. This will not break the direct-mapping,
> > > > and
> > > > >        is MPU system friendly.
> > > > >     So, on Armv8-R system, we can still keep current implementation
> > for
> > > > >     per-cpu `vcpu_info`.
> > > > >
> > > > >     So, our proposal is that, can we reuse current implementation
> > idea
> > > > of
> > > > >     `vcpu_info` for `shared_info`? We still allocate one page for
> > > > >     `d->shared_info` at domain construction for holding some initial
> > > > meta-data,
> > > > >     using alloc_domheap_pages instead of alloc_xenheap_pages and
> > > > >     share_xen_page_with_guest. And when guest allocates a page for
> > > > >     `shared_info` and use hypercall to setup it,  We copy the
> > initial
> > > > data from
> > > > >     `d->shared_info` to it. And after copy we can update `d-
> > > > >shared_info` to point
> > > > >     to guest allocated 'shared_info' page. In this case, we don't
> > have
> > > > to think
> > > > >     about the fragmentation of Xen heap and p2m and the extra MPU
> > > > regions.
> > > >
> > > > Yes, I think that would work.
> > > >
> > > > Also I think it should be possible to get rid of the initial
> > > > d->shared_info allocation in Xen, given that d->shared_info is for the
> > > > benefit of the guest and the guest cannot access it until it makes the
> > > > XENMAPSPACE_shared_info hypercall.
> > > >
> > >
> > > While we're working on event channel PoC work on Xen Armv8-R, we found
> > > another issue after we dropped d->shared_info allocation in Xen. Both
> > > shared_info and vcpu_info are allocated from Guest in runtime. That
> > > means the addresses of shared_info and vcpu_info are random. For MMU
> > > system, this is OK, because Xen has a full view of system memory in
> > > runtime. But for MPU system, the situation becomes a little tricky.
> > > We have to setup extra MPU regions for remote domains' shared_info
> > > and vcpu_info in event channel hypercall runtime. That's because
> > > in current Xen hypercall concept, hypercall will not cause vCPU
> > > context switch. When hypercall trap to EL2, it will keep vCPU's
> > > P2M view. For MMU system, we have vttbr_el2 for vCPU P2M view and
> > > ttbr_el2 for Xen view. So in EL2 Xen has full permissions to access
> > > any memory it wants. But for MPU system, we only have one EL2 MPU.
> > > Before entering guest, Xen will setup vCPU P2M view in EL2 MPU.
> > > In this case, when system entry EL2 through hypercall, the EL2
> > > MPU still keeps current vCPU P2M view and with Xen essential
> > > memory (code, data, heap) access permissions. But current EL2 MPU
> > > doesn't have the access permissions for EL2 to access other
> > > domain's memory. For an event channel hypercall, if we want to
> > > update the pending bitmap in remote domain's vcpu_info, it will
> > > cause a dataabort in EL2. To solve this dataabort, we may have
> > > two methods:
> > > 1. Map remote domain's whole memory or pages for shared_info +
> > >    vcpu_info in EL2 MPU temporarily for hypercall to update
> > >    pending bits or other accesses.
> > >
> > >    This method doesn't need to do context switch for EL2 MPU,
> > >    But this method has some disadvantages:
> > >    1. We have to reserve MPU regions for hypercall.
> > >    2. Different hypercall may have different reservation of
> > >       MPU regions.
> > >    3. We have to handle hypercall one by one for existed and
> > >       new in future.
> > >
> > > 2. Switch to Xen's memory view in EL2 MPU when trap from EL1 to
> > >    EL2. In this case, Xen will have full memory access permissions
> > >    to update pending bits in EL2. This only changes the EL2 MPU
> > >    context, does not need to do vCPU context switch. Because the
> > >    trapped vCPU will be used in the full flow of hypercall. After
> > >    the hypercall, before returning to EL2, the EL2 MPU will switch
> > >    to scheduled vCPU' P2M view.
> > >    This method needs to do EL2 MPU context switch, but:
> > >    1. We don't need to reserve MPU regions for Xen's memory view.
> > >       (Xen's memory view has been setup while initialization)
> > >    2. We don't need to handle pages' mapping in hypercall level.
> > >    3. Apply to other EL1 to EL2 traps, like dataabort, IRQ, etc.
> > 
> > 
> > Both approach 1) and 2) are acceptable and in fact I think we'll
> > probably have to do a combination of both.
> > 
> > We don't need to do a full MPU context switch every time we enter Xen.
> > We can be flexible. Only when Xen needs to access another guest memory,
> > if the memory is not mappable using approach 1), Xen could do a full MPU
> > context switch. Basically, try 1) first, if it is not possible, do 2).
> > 
> > This also solves the problem of "other hypercalls". We can always do 2)
> > if we cannot do 1).
> > 
> > So do we need to do 1) at all? It really depends on performance data.
> > Not all hypercalls are made equal. Some are very rare and it is fine if
> > they are slow. Some hypercalls are actually on the hot path. The event
> > channels hypercalls are on the hot path so they need to be fast. It
> > makes sense to implement 1) just for event channels hypercalls if the
> > MPU context switch is slow.
> > 
> > Data would help a lot here to make a good decision. Specifically, how
> > much more expensive is an EL2 MPU context switch compared to add/remove
> > of an MPU region in nanosec or cpu cycles?
> > 
> 
> We will do it when we get a proper platform.
> 
> > 
> > The other aspect is how many extra MPU regions do we need for each guest
> > to implement 1). Do we need one extra MPU region for each domU? If so, I
> > don't think approach 1) if feasible unless we come up with a smart
> > memory allocation scheme for shared_info and vcpu_info. For instance, if
> > shared_info and vcpu_info of all guests were part of the Xen data or
> > heap region, or 1 other special MPU region, then they could become
> > immediately accessible without need for extra mappings when switching to
> > EL2.
> > 
> 
> Allocate shared_info and vcpu_info from Xen data or heap will cause memory
> fragmentation. We have to split the Xen data or heap and populate the pages
> for shared_info and vcpu_info, And insert them to Guest P2M. Because Armv8-R
> MPU doesn't allow memory overlap, this will cause at least 2 extra MPU
> regions usage. One page could not exist in Xen MPU region and Guest P2M
> MPU region at the same time. And we definitely don't want to make the entire
> Xen data and heap accessible to EL1. And this approach does not solve the
> 100% direct mapping problem. A special MPU region might have the same issues.
> Except we make this special MPU region can be accessed in EL1 and EL2 at
> runtime (it's unsafe), and update hypercall to use pages from this special
> region for shared_info and vcpu_info (every guest can see this region, so
> it's still 1:1 mapping).
> 
> For 1), the concern is caused by our current rough PoC, we used extra MPU
> regions to map the whole memory of remote domain, whose may have serval
> memory blocks in the worst case. We have thought it further, we can reduce
> the map granularity to page. For example, Xen wants to update shared_info
> or vcpu_info, Xen must know the address of it. So we can just map this
> one page temporarily. So I think only reserve 1 MPU region for runtime
> mapping is feasible on most platforms.

Actually I think that it would be great if we can do that. It looks like
the best way forward.


> But the additional problem with this is that if the hypercall are
> modifying multiple variables, Xen may need to do multiple mappings if
> they are not on the same page (or a proper MPU region range).

There are not that many hypercalls that require Xen to map multiple
pages, and those might be OK if they are slow.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.