[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Limitations for Running Xen on KVM Arm64




> On 30. Oct 2025, at 14:41, haseeb.ashraf@xxxxxxxxxxx wrote:
> 
> Adding @julien@xxxxxxx and replying to his questions he asked over 
> #XenDevel:matrix.org.
> 
> can you add some details why the implementation cannot be optimized in KVM? 
> Asking because I have never seen such issue when running Xen on QEMU (without 
> nested virt enabled).
> AFAIK when Xen is run on QEMU without virtualization, then instructions are 
> emulated in QEMU while with KVM, ideally the instruction should run directly 
> on hardware except in some special cases (those trapped by FGT/CGT). Such as 
> this one where KVM maintains shadow page tables for each VM. It traps these 
> instructions and emulates them with callback such as handle_vmalls12e1is(). 
> The way this callback is implemented, it has to iterate over the whole 
> address space and clean-up the page tables which is a costly operation. 
> Regardless of this, it should still be optimized in Xen as invalidating a 
> selective range would be much better than invalidating a whole range of 
> 48-bit address space.
> Some details about your platform and use case would be helpful. I am 
> interested to know whether you are using all the features for nested virt.
> I am using AWS G4. My use case is to run Xen as guest hypervisor. Yes, most 
> of the features are enabled except VHE or those which are disabled by KVM.


Hello,

You mean Graviton4 (for reference to others, from a bare metal instance)? 
Interesting to see people caring about nested virt there :) - and hopefully 
using it wasn’t too much of a pain for you to deal with.

> 
> ; switch to current VMID
> tlbi rvae1, guest_vaddr ; first invalidate stage-1 TLB by guest VA for 
> current VMID
> tlbi ripas2e1, guest_paddr ; then invalidate stage-2 TLB by IPA range for 
> current VMID
> dsb ish
> isb
> ; switch back the VMID
>     • This is where I am not quite sure and I was hoping that if someone with 
> Arm expertise could sign off on this so that I can work on its implementation 
> in Xen. This will be an optimization not only for virtualized hardware but 
> also in general for Xen on arm64 machines.
> 

Note that the documentation says

> The invalidation is not required to apply to caching structures that combine 
> stage 1 and stage 2 translation table entries.

for TLBIP RIPAS2E1
>     • The second place in Xen where this is problematic is when multiple 
> vCPUs of the same domain juggle on single pCPU, TLBs are invalidated 
> everytime a different vCPU runs on a pCPU. I do not know how this can be 
> optimized. Any support on this is appreciated.


One way to handle this is every invalidate within the VM a broadcast TLB 
invalidate (HCR_EL2.FB is what you’re looking for) and then forego that TLB 
maintenance as it’s no longer necessary. This should not have a practical 
performance impact.

Thank you,
-Mohamed
> 
> 
> diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
> index 7642dbc7c5..e96ff92314 100644
> --- a/xen/arch/arm/mmu/p2m.c
> +++ b/xen/arch/arm/mmu/p2m.c
> @@ -247,7 +247,7 @@ void p2m_restore_state(struct vcpu *n)
>       * when running multiple vCPU of the same domain on a single pCPU.
>       */
>      if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
> -        flush_guest_tlb_local();
> +        ; // flush_guest_tlb_local();
>       *last_vcpu_ran = n->vcpu_id;
>  } 
> 
> Thanks & Regards,
> Haseeb Ashraf





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.