Xen project Mailing List

Re: [Xen-devel] [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 5 Feb 2019 15:01:08 +0100

Cc: Juergen Gross <jgross@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 05 Feb 2019 14:03:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Feb 05, 2019 at 05:49:07AM -0700, Jan Beulich wrote: > >>> On 05.02.19 at 12:15, <roger.pau@xxxxxxxxxx> wrote: > > On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote: > >> >>> On 30.01.19 at 11:36, <roger.pau@xxxxxxxxxx> wrote: > >> > Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP > >> > page tables, but due to this being done before setting up the IOMMU > >> > the non RAM regions in those areas are not added to the IOMMU page > >> > tables. Fix this by making sure any reserved regions below 1MB are > >> > added to the IOMMU page tables. > >> > >> So what was the reason again that we call iommu_hwdom_init() after > >> pvh_setup_p2m()? Am I mis-remembering there having been a patch > >> to flip their order? > > > > Yes - IIRC I found (broken) hardware that requires the iommu page > > tables to also contain certain RAM regions or else you get page faults > > or a complete system freeze when the iommu is enabled. > > > > It could be argued that even with this workaround such hardware is > > still likely to be broken anyway, because RAM regions are not identity > > mapped. From the emails I can find I was only able to observe this > > behavior with pre-Haswell hardware. > > > > Maybe it would be easier to just enable the iommu before populating > > the RAM regions in the p2m? That would simplify the code here. > > When, hence my ordering question. > > >> > --- a/xen/drivers/passthrough/x86/iommu.c > >> > +++ b/xen/drivers/passthrough/x86/iommu.c > >> > @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const > >> > struct domain *d, > >> > * inclusive mapping additionally maps in every pfn up to 4GB > >> > except those > >> > * that fall in unusable ranges for PV Dom0. > >> > */ > >> > - if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) || > >> > - /* > >> > - * Ignore any address below 1MB, that's already identity > >> > mapped by the > >> > - * Dom0 builder for HVM. > >> > - */ > >> > - (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ) > >> > >> There was a domain ID check here, and the comment explicitly said > >> Dom0. > >> > >> > + if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ) > >> > return false; > >> > > >> > switch ( type = page_get_ram_type(mfn) ) > >> > @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct > >> > domain *d) > >> > if ( !hwdom_iommu_map(d, pfn, max_pfn) ) > >> > continue; > >> > > >> > - if ( paging_mode_translate(d) ) > >> > + /* > >> > + * Don't add any address below 1MB to the HAP page tables, > >> > that's > >> > + * already done by the domain builder. Add addresses below 1MB > >> > to the > >> > + * IOMMU page tables only. > >> > + */ > >> > + if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) ) > >> > >> Nothing like this here. Did you determine that in the late hwdom > >> case things work without that extra precaution (i.e. the removed > >> check was really pointless)? If so, mentioning this would be helpful > >> (at the very least to be sure this was intentional). > > > > We don't currently have support for a pvh late-hwdom AFAIK, and > > whether this check is necessary or not depends on how such pvh > > late-hwdom is built, explicitly how the low 1MB is handled. > > Well, till now I've been assuming that the late hwdom (in the PV case) > would be built using the normal tool stack logic. I would then extend > this to PVH, and expect Xen to take care of the delta between what > the tool stack does and what the hardware domain needs. Well, I think that non-trivial changes would need to be performed to the toolstack in order to create a pvh late-hwdom. For once the physmap of a pvh hwdom needs to match the native one, and there's no logic in the toolstack at all to do this. My point is that making such adjustment here for a pvh late-hwdom is likely a red herring (or maybe not even needed or wrong), and there's a lot more work to do in order to be able to create a pvh late-hwdom. > > Maybe it's better to just forget about the pre-haswell workarounds and > > enable the iommu before populating the p2m, that would certainly > > simply the code here by removing the low 1MB special casing. > > Are you convinced that those workarounds are attributable to the > CPU family, and that hence with Haswell and newer they're gone > altogether? Not sure, I guess it's more likely part of the chipset rather the CPU itself? But since chipsets are usually paired with CPU families, it's quite likely the bogus chipset was only used in conjunction with pre-Haswell CPUs. Anyway, I'm happy to change the order so that the iommu is enabled before the p2m is populated and then drop this workaround from the iommu code. Would you be fine with such a change? Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.