[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Le 09/05/2025 à 23:13, Stefano Stabellini a écrit : > On Fri, 9 May 2025, Roger Pau Monné wrote: >> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote: >>> On Thu, 8 May 2025, Roger Pau Monné wrote: >>>> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote: >>>>> On Tue, 6 May 2025, Roger Pau Monné wrote: >>>>>> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote: >>>>>>> On Mon, 5 May 2025, Roger Pau Monné wrote: >>>>>>>> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki >>>>>>>> wrote: >>>>>>>>> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote: >>>>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote: >>>>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote: >>>>>>>>>>>> From: Xenia Ragiadakou <Xenia.Ragiadakou@xxxxxxx> >>>>>>>>>>>> >>>>>>>>>>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory >>>>>>>>>>>> addresses to firmware or co-processors not behind an IOMMU. >>>>>>>>>>> >>>>>>>>>>> I definitely don't understand the firmware part: It's subject to the >>>>>>>>>>> same transparent P2M translations as the rest of the VM; it's just >>>>>>>>>>> another piece of software running there. >>>>>>>>>>> >>>>>>>>>>> "Co-processors not behind an IOMMU" is also interesting; a more >>>>>>>>>>> concrete scenario might be nice, yet I realize you may be limited in >>>>>>>>>>> what you're allowed to say. >>>>>>>>>> >>>>>>>>>> Sure. On AMD x86 platforms there is a co-processor called PSP running >>>>>>>>>> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs >>>>>>>>>> occasionally to >>>>>>>>>> pass addresses to it. See drivers/tee/amdtee/ and >>>>>>>>>> include/linux/psp-tee.h in Linux. >>>>>>>>> >>>>>>>>> We had (have?) similar issue with amdgpu (for integrated graphics) - >>>>>>>>> it >>>>>>>>> uses PSP for loading its firmware. With PV dom0 there is a workaround >>>>>>>>> as >>>>>>>>> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, >>>>>>>>> but I >>>>>>>>> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's >>>>>>>>> the one I used for debugging this issue). >>>>>>>> >>>>>>>> That's ugly, and problematic when used in conjunction with AMD-SEV. >>>>>>>> >>>>>>>> I wonder if Xen could emulate/mediate some parts of the PSP for dom0 >>>>>>>> to use, while allowing Xen to be the sole owner of the device. Having >>>>>>>> both Xen and dom0 use it (for different purposes) seems like asking >>>>>>>> for trouble. But I also have no idea how complex the PSP interface >>>>>>>> is, neither whether it would be feasible to emulate the >>>>>>>> interfaces/registers needed for firmware loading. >>>>>>> >>>>>>> Let me take a step back from the PSP for a moment. I am not opposed to a >>>>>>> PSP mediator in Xen, but I want to emphasize that the issue is more >>>>>>> general and extends well beyond the PSP. >>>>>>> >>>>>>> In my years working in embedded systems, I have consistently seen cases >>>>>>> where Dom0 needs to communicate with something that does not go through >>>>>>> the IOMMU. This could be due to special firmware on a co-processor, a >>>>>>> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth >>>>>>> device that technically supports the IOMMU but performs poorly unless >>>>>>> the IOMMU is disabled. All of these are real-world examples that I have >>>>>>> seen personally. >>>>>> >>>>>> I wouldn't be surprised, classic PV dom0 avoided those issues because >>>>>> it was dealing directly with host addresses (mfns), but that's not the >>>>>> case with PVH dom0. >>>>> >>>>> Yeah >>>>> >>>>> >>>>>>> In my opinion, we definitely need a solution like this patch for Dom0 >>>>>>> PVH to function correctly in all scenarios. >>>>>> >>>>>> I'm not opposed to having such interface available for PVH hardware >>>>>> domains. I find it ugly, but I don't see much other way to deal with >>>>>> those kind of "devices". Xen mediating accesses for each one of them >>>>>> is unlikely to be doable. >>>>>> >>>>>> How do you hook this exchange interface into Linux to differentiate >>>>>> which drivers need to use mfns when interacting with the hardware? >>>>> >>>>> In the specific case we have at hands the driver is in Linux userspace >>>>> and is specially-written for our use case. It is not generic, so we >>>>> don't have this problem. But your question is valid. >>>> >>>> Oh, so you then have some kind of ioctl interface that does the memory >>>> exchange and bouncing inside of the kernel on behalf of the user-space >>>> side I would think? >>> >>> I am not sure... Xenia might know more than me here. >> >> One further question I have regarding this approach: have you >> considered just populating an empty p2m space with contiguous physical >> memory instead of exchanging an existing area? That would increase >> dom0 memory usage, but would prevent super page shattering in the p2m. >> You could use a dom0_mem=X,max:X+Y command line option, where Y >> would be your extra room for swiotlb-xen bouncing usage. >> >> XENMEM_increase_reservation documentation notes such hypercall already >> returns the base MFN of the allocated page (see comment in >> xen_memory_reservation struct declaration). > > XENMEM_exchange is the way it has been implemented traditionally in > Linux swiotlb-xen (it has been years). But your idea is good. > > Another, more drastic, idea would be to attempt to map Dom0 PVH memory > 1:1 at domain creation time like we do on ARM. If not all of it, as much > as possible. That would resolve the problem very efficiently. We could > communicate to Dom0 PVH the range that is 1:1 in one of the initial data > structures, and that would be the end of it. > Could that be done by introducing a "fake" reserved region in advance (IVMD?) ? Such region are usually mapped to the domain 1:1 in addition to be coherent on the IOMMU side (so it doesn't break in case the PSP gets IOMMU-aware). Teddy | Vates XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |