[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges



On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 12:07, <roger.pau@xxxxxxxxxx> wrote:
> > Several people have reported hardware issues (malfunctioning USB
> > controllers) due to iommu page faults on Intel hardware. Those faults
> > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
> > be worked around on VTd hardware by manually adding RMRR entries on
> > the command line, this is however limited to Intel hardware and quite
> > cumbersome to do.
> > 
> > In order to solve those issues add a new dom0-iommu=reserved option
> > that identity maps all regions marked as reserved in the memory map.
> 
> Considering that report we've had (yesterday? earlier today?), don't
> we need to go further and use the union of RMRRs and reserved
> regions? Iirc they had a case where an RMRR was not in a reserved
> region ...

AFAICT (and I could be reading the code wrong) RMRR regions not on
reserved regions are still correctly mapped to the guest.

> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses 
> > to that port.
> >  >> Enable IOMMU debugging code (implies `verbose`).
> >  
> >  ### dom0-iommu
> > -> `= List of [ none | strict | relaxed | inclusive ]`
> > +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
> >  
> >  * `none`: disables DMA remapping for Dom0.
> >  
> > @@ -1233,6 +1233,15 @@ meaning:
> >    option is only applicable to a PV Dom0 and is enabled by default on Intel
> >    hardware.
> >  
> > +* `reserved`: sets up DMA remapping for all the reserved regions in the  
> > memory
> > +  map for Dom0. Use this to work around firmware issues providing incorrect
> > +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
> > +  for Dom0, all memory regions marked as reserved in the memory map that 
> > don't
> > +  overlap with any MMIO region from emulated devices will be identity 
> > mapped.
> > +  This option maps a subset of the memory that would be mapped when using 
> > the
> > +  `inclusive` option. This option is available to a PVH Dom0 and is 
> > enabled by
> > +  default on Intel hardware.
> 
> The sub-options so far were quite clear in their meanings, but
> "dom0-iommu=reserved" might mean all sorts of things to me, but quite
> certainly not "map all reserved regions". "map-reserved" perhaps?

Then we should also have 'map-inclusive' for symmetry IMO.

> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const 
> > struct domain *d,
> >      return NULL;
> >  }
> >  
> > +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
> > +{
> > +    return vpci_mmcfg_find(d, addr);
> > +}
> 
> I think the function name should have an "is_" somewhere, to make
> clear address is a noun here and not a verb.
> 
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct 
> > domain *d)
> >      /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
> >      iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
> >                                                        : 
> > iommu_dom0_inclusive;
> > +    /* Reserved IOMMU mappings are disabled by default on AMD hardware. */
> > +    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
> > +                                                    : iommu_dom0_reserved;
> 
> Especially seeing these two together now, I think an if() each would
> be easier to read (not the least because of being shorter). To me,
> the main purpose of the conditional operator is to allow shortening
> simple if/else pairs, rather than lengthening simple if()-s.
> 
> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
> >  {
> >  }
> >  
> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned 
> > long pfn,
> > +                                         unsigned long max_pfn)
> > +{
> > +    unsigned int i;
> > +
> > +    /*
> > +     * Ignore any address below 1MB, that's already identity mapped by the
> > +     * domain builder for HVM.
> > +     */
> > +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
> 
> Careful again here with the distinction between Dom0 and hwdom:
> The domain builder you refer to is - aiui - the in-Xen one, i.e. the
> one _only_ dealing with Dom0.

So this should instead be:

if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||

> > +    /*
> > +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then 
> > exclude
> > +     * conventional RAM and let the common code map dom0's pages.
> > +     */
> > +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> > +         (iommu_dom0_strict || is_hvm_domain(d)) )
> > +        return false;
> > +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> > +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
> > +        return false;
> > +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> > +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> > +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> > +         (!iommu_dom0_inclusive || pfn > max_pfn) )
> > +        return false;
> 
> As page_is_ram_type() is, especially on systems with many E820
> entries, not really cheap, I think at least a minimum amount of
> optimization is on order here - after all you do this for every
> single page of the system. Hence minimally the result of the first
> CONVENTIONAL and RESERVED queries should be latched and
> re-used (or "else" be used suitably). Ideally an approach would
> be used which involved just a single iteration through the E820
> map, but I realize this may be more than is feasible here.

This would be quite better if I could just fetch the type, then I
could add a switch (type) { ... and it would be better IMO.

> Furthermore I'm unconvinced the !page_is_ram_type() uses
> are really what you want: The function returning false means
> "don't know", not "no". Perhaps the function needs to be made
> return a tristate (yes, no, or part of it).

Right, that's why I have three different !page_is_ram_type checks in
the last branch of the if, so that I can make sure it's not one of the
previous types and also account for holes.

> > +    /* Check that it doesn't overlap with the LAPIC */
> > +    if ( has_vlapic(d) )
> > +    {
> > +        const struct vcpu *v;
> > +
> > +        for_each_vcpu(d, v)
> > +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> > +                return false;
> > +    }
> 
> I don't, btw, recall any code adjusting the IOMMU mappings in
> case the domain relocates its LAPIC. If you do the check above,
> wouldn't that other side too need taking care of?

Yes. I can add something later but this is already an issue if the
guest for example relocates the LAPIC over a RAM region.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.