|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 00/19] GICv4 Support for Xen
Hi Bertrand, On Tue, Feb 3, 2026 at 12:02 PM Bertrand Marquis <Bertrand.Marquis@xxxxxxx> wrote: > > Hi Mykyta, > > We have a number of series from you which have not been merged yet and > reviewing them all in parallel might be challenging. > > Would you mind giving us a status and maybe priorities on them. > > I could list the following series: > - GICv4 > - CPU Hotplug on arm > - PCI enumeration on arm > - IPMMU for pci on arm > - dom0less for pci passthrough on arm > - SR-IOV for pvh > - SMMU for pci on arm > - MSI injection on arm > - suspend to ram on arm > > There might be others feel free to complete the list. > > On GICv4... > > > On 2 Feb 2026, at 17:14, Mykyta Poturai <Mykyta_Poturai@xxxxxxxx> wrote: > > > > This series introduces GICv4 direct LPI injection for Xen. > > > > Direct LPI injection relies on the GIC tracking the mapping between > > physical and > > virtual CPUs. Each VCPU requires a VPE that is created and registered with > > the > > GIC via the `VMAPP` ITS command. The GIC is then informed of the current > > VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the > > appropriate redistributor. LPIs are associated with VPEs through the > > `VMAPTI` > > ITS command, after which the GIC handles delivery without trapping into the > > hypervisor for each interrupt. > > > > When a VPE is not scheduled but has pending interrupts, the GIC raises a > > per-VPE > > doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling > > so > > the VPE can drain its pending LPIs. > > > > Because GICv4 lacks a native doorbell invalidation mechanism, this series > > includes a helper that invalidates doorbell LPIs via synthetic “proxy” > > devices, > > following the approach used until GICv4.1. > > > > All of this work is mostly based on the work of Penny Zheng > > <penny.zheng@xxxxxxx> and Luca Fancellu <luca.fancellu@xxxxxxx>. And also > > from > > Linux patches by Mark Zyngier. > > > > Some patches are still a little rough and need some styling fixes and more > > testing, as all of them needed to be carved line by line from a giant ~4000 > > line > > patch. This RFC is directed mostly to get a general idea if the proposed > > approach is suitable and OK with everyone. And there is still an open > > question > > of how to handle Signed-off-by lines for Penny and Luca, since they have not > > indicated their preference yet. > > I would like to ask how much performance benefits you could > have with this. > Adding GICv4 support is adding a lot of code which will have to be maintained > and tested and there should be a good improvement to justify this. > > Did you do some benchmarks ? what are the results ? > > At the time where we started to work on that at Arm, we ended up in the > conclusion > that the complexity in Xen compared to the benefit was not justifying it > hence why > this work was stopped in favor of other features that we thought would be more > beneficial to Xen (like PCI passthrough or SMMUv3). I have been asked to run benchmarks for this series, so here is a short update from my side. Test setup: - AWS c7g bare metal - Linux bare-metal reference and Xen dom0 runs - fio random-read workloads on an NVMe-backed EBS volume (gp3, 160G, 80k iops) - Main workloads: - 4k, iodepth=1 - 16k, iodepth=1 - 4k, iodepth=4 - 4k, iodepth=1, numjobs=4 - 5 repetitions per configuration, looking mainly at median values - Main Xen comparison was done with the default scheduler (credit2), direct LPIs OFF vs ON Summary: - With credit2, enabling direct LPIs gave a small but repeatable IOPS improvement across all tested workloads, roughly in the 0.8-1.1% range. - Mean completion latency also improved consistently. - The clearest gain was in tail latency. In the 4k randread, iodepth=1, numjobs=4 case, p99.9 improved by about 41% and p99.99 by about 34% with direct LPIs enabled. - In this setup, switching from credit2 to null did not materially change median throughput, so the observed improvement appears to come primarily from the interrupt delivery path rather than from the scheduler choice. A few caveats: - This was a low-contention setup with only dom0 using 8 CPUs, so it did not exercise heavy VCPU migration or scheduler pressure. - I also tried an artificially constrained NVMe host queue depth configuration, but I am treating that only as a stress/control case and not as the main result. A full benchmark report is available here: https://github.com/xakep-amatop/giv4-benchmark/blob/main/report.pdf The same repository also contains the raw benchmark result archives used for the analysis. So, based on these measurements, there does appear to be a measurable benefit from direct LPI injection, with the strongest effect showing up in tail latency rather than in median throughput. If you need any additional benchmark results or specific test cases, please let me know. Best regards, Mykola > > Cheers > Bertrand >
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |