Xen project Mailing List

Re: Limitations for Running Xen on KVM Arm64

> On 30. Oct 2025, at 14:41, haseeb.ashraf@xxxxxxxxxxx wrote: > > Adding @julien@xxxxxxx and replying to his questions he asked over > #XenDevel:matrix.org. > > can you add some details why the implementation cannot be optimized in KVM? > Asking because I have never seen such issue when running Xen on QEMU (without > nested virt enabled). > AFAIK when Xen is run on QEMU without virtualization, then instructions are > emulated in QEMU while with KVM, ideally the instruction should run directly > on hardware except in some special cases (those trapped by FGT/CGT). Such as > this one where KVM maintains shadow page tables for each VM. It traps these > instructions and emulates them with callback such as handle_vmalls12e1is(). > The way this callback is implemented, it has to iterate over the whole > address space and clean-up the page tables which is a costly operation. > Regardless of this, it should still be optimized in Xen as invalidating a > selective range would be much better than invalidating a whole range of > 48-bit address space. > Some details about your platform and use case would be helpful. I am > interested to know whether you are using all the features for nested virt. > I am using AWS G4. My use case is to run Xen as guest hypervisor. Yes, most > of the features are enabled except VHE or those which are disabled by KVM. Hello, You mean Graviton4 (for reference to others, from a bare metal instance)? Interesting to see people caring about nested virt there :) - and hopefully using it wasn’t too much of a pain for you to deal with. > > ; switch to current VMID > tlbi rvae1, guest_vaddr ; first invalidate stage-1 TLB by guest VA for > current VMID > tlbi ripas2e1, guest_paddr ; then invalidate stage-2 TLB by IPA range for > current VMID > dsb ish > isb > ; switch back the VMID > • This is where I am not quite sure and I was hoping that if someone with > Arm expertise could sign off on this so that I can work on its implementation > in Xen. This will be an optimization not only for virtualized hardware but > also in general for Xen on arm64 machines. > Note that the documentation says > The invalidation is not required to apply to caching structures that combine > stage 1 and stage 2 translation table entries. for TLBIP RIPAS2E1 > • The second place in Xen where this is problematic is when multiple > vCPUs of the same domain juggle on single pCPU, TLBs are invalidated > everytime a different vCPU runs on a pCPU. I do not know how this can be > optimized. Any support on this is appreciated. One way to handle this is every invalidate within the VM a broadcast TLB invalidate (HCR_EL2.FB is what you’re looking for) and then forego that TLB maintenance as it’s no longer necessary. This should not have a practical performance impact. Thank you, -Mohamed > > > diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c > index 7642dbc7c5..e96ff92314 100644 > --- a/xen/arch/arm/mmu/p2m.c > +++ b/xen/arch/arm/mmu/p2m.c > @@ -247,7 +247,7 @@ void p2m_restore_state(struct vcpu *n) > * when running multiple vCPU of the same domain on a single pCPU. > */ > if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id ) > - flush_guest_tlb_local(); > + ; // flush_guest_tlb_local(); > *last_vcpu_ran = n->vcpu_id; > } > > Thanks & Regards, > Haseeb Ashraf

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.