Xen project Mailing List

RE: Proposal for Porting Xen to Armv8-R64 - DraftA

To: Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Thu, 3 Mar 2022 01:35:50 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RcsEOKu715kcgR7lt6oNR1CEjCILmSXWRmq1U6wvokQ=; b=bFc0FcRj885C8ktvpZFb2KcVS57dAq3qFhuk2cCWcQpsyQriNyTEltnle0m23YThuj50PzTBrXWrbMYVCELep8ty4t5+HBy6FZsV3UinDnghIhLna0USPcaUNnbgubKO6NFIsHZqO9F/TrVD/BAeEcFkC41psltMHLZhZbs0q6QWcV6HIklqG00NM7iO3H8mcZGcGdrvBMrGxKUFgNy9ZTIg3hGrdtuv0b/k4PDUZ8W8d3lTKsDU9jYZ/R9m1W9JDH0iD2Zk7EXngflT2UJ8IrZrPuANJGxCjzSsF36ztsCPx/3j8WxC71StTZ9xUcBVk9SZRIaCG8rNPRYfy0FH5A==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SfjgsUR573a7ReE5gDJI5M84e4hQsotJL407fSh/54LmM1GCtW0aIxQ9dsMkwRfepDCs5fzt0DxUPwLNQjRJ6egIMtPlHVF4SjW9bM8AMLTVRgJM1Vr58LmLpp12/+0kEYI0xd4zIefe431xUcfvE6/V0dkw6te6Y9ySdWSxmesEb76j5xBZF2ScQ1223LNHeuAgV0pqGhfn8SQnuLNnUnqJRZZaTnq4Q/9JZQaVDh3y2/rMP987Uhks7rp3cXxx4yfC0Oz9qEEeR6LmTPUKgPfnIlquUPFh+cjCdpvNGDfbkHBPBsckLk81QoQC1oIpup7a2TGofwGoTjasTgK8yQ==

Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Penny Zheng <Penny.Zheng@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, nd <nd@xxxxxxx>

Delivery-date: Thu, 03 Mar 2022 01:36:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Nodisclaimer: true

Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Thread-index: AdgpQxtXwh7LkfydTgiYk9bhMgU+ogAn0mUAABEK2UAAF1ylAACsO9YwAA5uZgAAI3g7sAAIzZgAAB9QTKA=

Thread-topic: Proposal for Porting Xen to Armv8-R64 - DraftA

Hi Julien, > -----Original Message----- > From: Julien Grall <julien@xxxxxxx> > Sent: 2022年3月2日 18:25 > To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini > <sstabellini@xxxxxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > <Bertrand.Marquis@xxxxxxx>; Penny Zheng <Penny.Zheng@xxxxxxx>; Henry Wang > <Henry.Wang@xxxxxxx>; nd <nd@xxxxxxx> > Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA > > Hi Wei, > > On 02/03/2022 06:43, Wei Chen wrote: > >> -----Original Message----- > >> From: Julien Grall <julien@xxxxxxx> > >> Sent: 2022年3月1日 21:17 > >> To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini > >> <sstabellini@xxxxxxxxxx> > >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > >> <Bertrand.Marquis@xxxxxxx>; Penny Zheng <Penny.Zheng@xxxxxxx>; Henry > Wang > >> <Henry.Wang@xxxxxxx>; nd <nd@xxxxxxx> > >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA > >> > >> On 01/03/2022 06:29, Wei Chen wrote: > >>> Hi Julien, > >> > >> Hi, > >> > >>>> -----Original Message----- > >>>> From: Julien Grall <julien@xxxxxxx> > >>>> Sent: 2022年2月26日 4:12 > >>>> To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini > >>>> <sstabellini@xxxxxxxxxx> > >>>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > >>>> <Bertrand.Marquis@xxxxxxx>; Penny Zheng <Penny.Zheng@xxxxxxx>; Henry > >> Wang > >>>> <Henry.Wang@xxxxxxx>; nd <nd@xxxxxxx> > >>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA > >>>> > >>>> Hi Wei, > >>>> > >>>> On 25/02/2022 10:48, Wei Chen wrote: > >>>>>>> Armv8-R64 can support max to 256 MPU regions. But that's > just > >>>>>> theoretical. > >>>>>>> So we don't want to define `pr_t mpu_regions[256]`, this is > a > >>>> memory > >>>>>> waste > >>>>>>> in most of time. So we decided to let the user specify > through > >> a > >>>>>> Kconfig > >>>>>>> option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default > value > >> can > >>>> be > >>>>>> `32`, > >>>>>>> it's a typical implementation on Armv8-R64. Users will > >> recompile > >>>> Xen > >>>>>> when > >>>>>>> their platform changes. So when the MPU changes, > respecifying > >> the > >>>>>> MPU > >>>>>>> protection regions number will not cause additional > problems. > >>>>>> > >>>>>> I wonder if we could probe the number of MPU regions at runtime and > >>>>>> dynamically allocate the memory needed to store them in arch_vcpu. > >>>>>> > >>>>> > >>>>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But > it > >>>> seems > >>>>> we will encounter some static allocated arch_vcpu problems and > sizeof > >>>> issue. > >>>> > >>>> Does it need to be embedded in arch_vcpu? If not, then we could > >> allocate > >>>> memory outside and add a pointer in arch_vcpu. > >>>> > >>> > >>> We had thought to use a pointer in arch_vcpu instead of embedding > >> mpu_regions > >>> into arch_vcpu. But we noticed that arch_vcpu has a > __cacheline_aligned > >>> attribute, this may be because of arch_vcpu will be used very > frequently > >>> in some critical path. So if we use the pointer for mpu_regions, may > >> cause > >>> some cache miss in these critical path, for example, in context_swtich. > >> > >> From my understanding, the idea behind ``cacheline_aligned`` is to > >> avoid the struct vcpu to be shared with other datastructure. Otherwise > >> you may end up to have two pCPUs to frequently write the same cacheline > >> which is not ideal. > >> > >> arch_vcpu should embbed anything that will be accessed often (e.g. > >> entry/exit) to certain point. For instance, not everything related to > >> the vGIC are embbed in the vCPU/Domain structure. > >> > >> I am a bit split regarding the mpu_regions. If they are mainly used in > >> the context_switch() then I would argue this is a premature > optimization > >> because the scheduling decision is probably going to take a lot more > >> time than the context switch itself. > > > > mpu_regions in arch_vcpu are used to save guest's EL1 MPU context. So, > yes, > > they are mainly used in context_switch. In terms of the number of > registers, > > it will save/restore more work than the original V8A. And on V8R we also > need > > to keep most of the original V8A save/restore work. So it will take > longer > > than the original V8A context_switch. And I think this is due to > architecture's > > difference. So it's impossible for us not to save/restore EL1 MPU region > > registers in context_switch. And we have done some optimization for EL1 > MPU > > save/restore: > > 1. Assembly code for EL1 MPU context_switch > > This discussion reminds me when KVM decided to rewrite their context > switch from assembly to C. The outcome was the compiler is able to do a > better job than us when it comes to optimizing. > > With a C version, we could also share the save/restore code with 32-bit > and it is easier to read/maintain. > > So I would suggest to run some numbers to check if it really worth > implementing the MPU save/restore in assembly. > It's interesting to hear KVM guys have similar discussion. Yes, if the gains of assembly code is not very obvious, then reusing the code for 32-bit would be more important. As our current platform (FVP) could not do very precise performance measurement. I want to keep current assembly code there, when we have a platform that can do such measurement we can have a thread to discuss it. > > 2. Use real MPU regions number instead of > CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS > > in context_switch. CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS is defined > the Max > > supported EL1 MPU regions for this Xen image. All platforms that > implement > > EL1 MPU regions in this range can work well with this Xen Image. But > if the > > implemented EL1 MPU region number exceeds > CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS, > > this Xen image could not work well on this platform. > > This sounds similar to the GICv3. The number of LRs depends on the > hardware. See how we dealt with it in gicv3_save_lrs(). > This is a good suggestion, we will check the GIC code. > > > >> > >> Note that for the P2M we already have that indirection because it is > >> embbed in the struct domain. > > > > It's different with V8A P2M case. In V8A context_switch we just need to > > save/restore VTTBR, we don't need to do P2M table walk. But on V8R, we > > need to access valid mpu_regions for save/restore. > > The save/restore for the P2M is a bit more complicated than simply > save/restore the VTTBR. But yes, I agree the code for the MPU will > likely be more complicated. > > > > >> > >> This raises one question, why is the MPUs regions will be per-vCPU > >> rather per domain? > >> > > > > Because there is a EL1 MPU component for each pCPU. We can't assume > guest > > to use the same EL1 MPU configuration for all vCPU. > > Ah. Sorry, I thought you were referring to whatever Xen will use to > prevent the guest accessing outside of its designated region. > NP : ) Thanks, Wei Chen > Cheers, > > -- > Julien Grall

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.