Xen project Mailing List

Re: Limitations for Running Xen on KVM Arm64

From: Mohamed Mediouni <mohamed@xxxxxxxxxxxxxxxx>

Date: Fri, 31 Oct 2025 12:54:37 +0100

Cc: haseeb.ashraf@xxxxxxxxxxx, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Volodymyr_Babchuk@xxxxxxxx" <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Fri, 31 Oct 2025 11:55:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Mail-alias-created-date: 1752046281608

> On 31. Oct 2025, at 10:18, Julien Grall <julien@xxxxxxx> wrote: > > > > On 31/10/2025 00:20, Mohamed Mediouni wrote: >>> On 31. Oct 2025, at 00:55, Julien Grall <julien@xxxxxxx> wrote: >>> >>> Hi Mohamed, >>> >>> On 30/10/2025 18:33, Mohamed Mediouni wrote: >>>>> On 30. Oct 2025, at 14:41, haseeb.ashraf@xxxxxxxxxxx wrote: >>>>> >>>>> Adding @julien@xxxxxxx and replying to his questions he asked over >>>>> #XenDevel:matrix.org. >>>>> >>>>> can you add some details why the implementation cannot be optimized in >>>>> KVM? Asking because I have never seen such issue when running Xen on QEMU >>>>> (without nested virt enabled). >>>>> AFAIK when Xen is run on QEMU without virtualization, then instructions >>>>> are emulated in QEMU while with KVM, ideally the instruction should run >>>>> directly on hardware except in some special cases (those trapped by >>>>> FGT/CGT). Such as this one where KVM maintains shadow page tables for >>>>> each VM. It traps these instructions and emulates them with callback such >>>>> as handle_vmalls12e1is(). The way this callback is implemented, it has to >>>>> iterate over the whole address space and clean-up the page tables which >>>>> is a costly operation. Regardless of this, it should still be optimized >>>>> in Xen as invalidating a selective range would be much better than >>>>> invalidating a whole range of 48-bit address space. >>>>> Some details about your platform and use case would be helpful. I am >>>>> interested to know whether you are using all the features for nested virt. >>>>> I am using AWS G4. My use case is to run Xen as guest hypervisor. Yes, >>>>> most of the features are enabled except VHE or those which are disabled >>>>> by KVM. >>>> Hello, >>>> You mean Graviton4 (for reference to others, from a bare metal instance)? >>>> Interesting to see people caring about nested virt there :) - and >>>> hopefully using it wasn’t too much of a pain for you to deal with. >>>>> >>>>> ; switch to current VMID >>>>> tlbi rvae1, guest_vaddr ; first invalidate stage-1 TLB by guest VA for >>>>> current VMID >>>>> tlbi ripas2e1, guest_paddr ; then invalidate stage-2 TLB by IPA range for >>>>> current VMID >>>>> dsb ish >>>>> isb >>>>> ; switch back the VMID >>>>> • This is where I am not quite sure and I was hoping that if someone >>>>> with Arm expertise could sign off on this so that I can work on its >>>>> implementation in Xen. This will be an optimization not only for >>>>> virtualized hardware but also in general for Xen on arm64 machines. >>>>> >>>> Note that the documentation says >>>>> The invalidation is not required to apply to caching structures that >>>>> combine stage 1 and stage 2 translation table entries. >>>> for TLBIP RIPAS2E1 >>>>> • The second place in Xen where this is problematic is when multiple >>>>> vCPUs of the same domain juggle on single pCPU, TLBs are invalidated >>>>> everytime a different vCPU runs on a pCPU. I do not know how this can be >>>>> optimized. Any support on this is appreciated. >>>> One way to handle this is every invalidate within the VM a broadcast TLB >>>> invalidate (HCR_EL2.FB is what you’re looking for) and then forego that >>>> TLB maintenance as it’s no longer necessary. This should not have a >>>> practical performance impact. >>> >>> To confirm my understanding, you are suggesting to rely on the L2 guest to >>> send the TLB flush. Did I understanding correctly? If so, wouldn't this >>> open a security hole because a misbehaving guest may never send the TLB >>> flush? >>> >> Hello, >> HCR_EL2.FB can be used to make every TLB invalidate the guest issues (which >> is a stage1 one) a broadcast TLB invalidate. > > Xen already sets HCR_EL2.FB. But I believe this is only solving the problem > where the vCPU is moved to another pCPU. This doesn't solve the problem where > two vCPUs from the same VM is sharing the same pCPU. > > Per the Arm Arm each CPU have their own private TLBs. So we have to flush > between vCPU of the same domains to avoid translations from vCPU 1 to "leak" > to the vCPU 2 (they may have confliected page-tables). Hm… it varies on whether the VM uses CnP or not (and whether the HW supports it)… (Linux does…) > KVM has a similar logic see "last_vcpu_ran" and "__kvm_flush_cpu_context()". > That said... they are using "vmalle1" whereas we are using "vmalls12e1". So > maybe we can relax it. Not sure if this would make any difference for the > performance though. vmalle1 avoids the problem here (because it only invalidates stage-1 translations). > Cheers, > > -- > Julien Grall > >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.