[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables



On Tue, Feb 05, 2019 at 05:49:07AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 12:15, <roger.pau@xxxxxxxxxx> wrote:
> > On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote:
> >> >>> On 30.01.19 at 11:36, <roger.pau@xxxxxxxxxx> wrote:
> >> > Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP
> >> > page tables, but due to this being done before setting up the IOMMU
> >> > the non RAM regions in those areas are not added to the IOMMU page
> >> > tables. Fix this by making sure any reserved regions below 1MB are
> >> > added to the IOMMU page tables.
> >> 
> >> So what was the reason again that we call iommu_hwdom_init() after
> >> pvh_setup_p2m()? Am I mis-remembering there having been a patch
> >> to flip their order?
> > 
> > Yes - IIRC I found (broken) hardware that requires the iommu page
> > tables to also contain certain RAM regions or else you get page faults
> > or a complete system freeze when the iommu is enabled.
> > 
> > It could be argued that even with this workaround such hardware is
> > still likely to be broken anyway, because RAM regions are not identity
> > mapped. From the emails I can find I was only able to observe this
> > behavior with pre-Haswell hardware.
> > 
> > Maybe it would be easier to just enable the iommu before populating
> > the RAM regions in the p2m? That would simplify the code here.
> 
> When, hence my ordering question.
> 
> >> > --- a/xen/drivers/passthrough/x86/iommu.c
> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
> >> > @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const 
> >> > struct domain *d,
> >> >       * inclusive mapping additionally maps in every pfn up to 4GB 
> >> > except those
> >> >       * that fall in unusable ranges for PV Dom0.
> >> >       */
> >> > -    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
> >> > -         /*
> >> > -          * Ignore any address below 1MB, that's already identity 
> >> > mapped by the
> >> > -          * Dom0 builder for HVM.
> >> > -          */
> >> > -         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )
> >> 
> >> There was a domain ID check here, and the comment explicitly said
> >> Dom0.
> >> 
> >> > +    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
> >> >          return false;
> >> >  
> >> >      switch ( type = page_get_ram_type(mfn) )
> >> > @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct 
> >> > domain *d)
> >> >          if ( !hwdom_iommu_map(d, pfn, max_pfn) )
> >> >              continue;
> >> >  
> >> > -        if ( paging_mode_translate(d) )
> >> > +        /*
> >> > +         * Don't add any address below 1MB to the HAP page tables, 
> >> > that's
> >> > +         * already done by the domain builder. Add addresses below 1MB 
> >> > to the
> >> > +         * IOMMU page tables only.
> >> > +         */
> >> > +        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )
> >> 
> >> Nothing like this here. Did you determine that in the late hwdom
> >> case things work without that extra precaution (i.e. the removed
> >> check was really pointless)? If so, mentioning this would be helpful
> >> (at the very least to be sure this was intentional).
> > 
> > We don't currently have support for a pvh late-hwdom AFAIK, and
> > whether this check is necessary or not depends on how such pvh
> > late-hwdom is built, explicitly how the low 1MB is handled.
> 
> Well, till now I've been assuming that the late hwdom (in the PV case)
> would be built using the normal tool stack logic. I would then extend
> this to PVH, and expect Xen to take care of the delta between what
> the tool stack does and what the hardware domain needs.

Well, I think that non-trivial changes would need to be performed to
the toolstack in order to create a pvh late-hwdom. For once the
physmap of a pvh hwdom needs to match the native one, and there's no
logic in the toolstack at all to do this.

My point is that making such adjustment here for a pvh late-hwdom is
likely a red herring (or maybe not even needed or wrong), and there's
a lot more work to do in order to be able to create a pvh
late-hwdom.

> > Maybe it's better to just forget about the pre-haswell workarounds and
> > enable the iommu before populating the p2m, that would certainly
> > simply the code here by removing the low 1MB special casing.
> 
> Are you convinced that those workarounds are attributable to the
> CPU family, and that hence with Haswell and newer they're gone
> altogether?

Not sure, I guess it's more likely part of the chipset rather the CPU
itself? But since chipsets are usually paired with CPU families, it's
quite likely the bogus chipset was only used in conjunction with
pre-Haswell CPUs.

Anyway, I'm happy to change the order so that the iommu is enabled
before the p2m is populated and then drop this workaround from the
iommu code. Would you be fine with such a change?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.