| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0
 On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote:
> On 14.02.2025 10:29, Roger Pau Monne wrote:
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap.
> >  
> >  ### dom0
> >      = List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
> > -                cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86)
> > +                cpuid-faulting=<bool>, msr-relaxed=<bool>,
> > +                pf-fixup=<bool> ] (x86)
> >  
> >      = List of [ sve=<integer> ] (Arm64)
> >  
> > @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems.
> >  
> >      If using this option is necessary to fix an issue, please report a bug.
> >  
> > +*   The `pf-fixup` boolean is only applicable when using a PVH dom0 and
> > +    defaults to false.
> > +
> > +    When running dom0 in PVH mode the dom0 kernel has no way to map MMIO
> > +    regions into the p2m, such mode relies on Xen dom0 builder populating
> > +    the p2m with all MMIO regions that dom0 should access.  However Xen
> > +    doesn't have a complete picture of the host memory map, due to not
> > +    being able to process ACPI dynamic tables.
> > +
> > +    The `pf-fixup` option allows Xen to attempt to add missing MMIO regions
> > +    to the p2m in response to page-faults generated by dom0 trying to 
> > access
> > +    unpopulated entries in the p2m.
> 
> I wonder if this is to implementation focused for a command line option doc.
> In particular the multiple uses of "p2m" are standing out in this regard.
Hm, let me try to change p2m with 'dom0 physical memory map' or similar.
> > --- a/xen/arch/x86/dom0_build.c
> > +++ b/xen/arch/x86/dom0_build.c
> > @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const 
> > char *e)
> >          opt_dom0_cpuid_faulting = val;
> >      else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 )
> >          opt_dom0_msr_relaxed = val;
> > +#ifdef CONFIG_HVM
> > +    else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 )
> > +        opt_dom0_pf_fixup = val;
> > +#endif
> >      else
> >          return -EINVAL;
> 
> I fear the scope of these sub-options is getting increasingly confusing.
> opt_dom0_msr_relaxed is what its name says - specific to Dom0.
> opt_dom0_cpuid_faulting, otoh, is a control domain option (i.e. also
> applicable to a [hypothetical?] late ctrldom). Now you add an option
> that's applicable to the hardware domain, i.e. also coverting late-hwdom.
It's kind of a mixed bag, but ATM I would leave it as-is because it's
likely easier for users to find the options if they are grouped
together.  WE can always add more fine grained options if there's a
desired for them.
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -338,8 +338,38 @@ static int hvmemul_do_io(
> >          if ( !s )
> >          {
> >              if ( is_mmio && is_hardware_domain(currd) )
> > -                gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size 
> > %u\n",
> > -                         dir ? "read" : "write", addr, size);
> > +            {
> > +                /*
> > +                 * PVH dom0 is likely missing MMIO mappings on the p2m, 
> > due to
> > +                 * the incomplete information Xen has about the memory 
> > layout.
> > +                 *
> > +                 * Either print a message to note dom0 attempted to access 
> > an
> > +                 * unpopulated GPA, or try to fixup the p2m by creating an
> > +                 * identity mapping for the faulting GPA.
> > +                 */
> > +                if ( opt_dom0_pf_fixup )
> > +                {
> > +                    int inner_rc = hvm_hwdom_fixup_p2m(addr);
> 
> Why not use rc, as we do elsewhere in the function?
hvm_hwdom_fixup_p2m() returns an errno, while rc in this context
contains X86EMUL_ values.  I could indeed re-use rc, it just felt
wrong to mix different error address spaces on the same variable.
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -13,6 +13,7 @@
> >  #include <xen/lib.h>
> >  #include <xen/trace.h>
> >  #include <xen/sched.h>
> > +#include <xen/iocap.h>
> >  #include <xen/irq.h>
> >  #include <xen/softirq.h>
> >  #include <xen/domain.h>
> > @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, 
> > struct domain *src)
> >      return rc;
> >  }
> >  
> > +bool __ro_after_init opt_dom0_pf_fixup;
> > +int hvm_hwdom_fixup_p2m(paddr_t addr)
> 
> The placement here looks odd to me. Why not as static function in emulate.c?
> Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c?
I don't have a strong opinion, if you are fine with it a static
function in emulate.c might be the best then.
> > +{
> > +    unsigned long gfn = paddr_to_pfn(addr);
> > +    struct domain *currd = current->domain;
> > +    p2m_type_t type;
> > +    mfn_t mfn;
> > +    int rc;
> > +
> > +    ASSERT(is_hardware_domain(currd));
> > +    ASSERT(!altp2m_active(currd));
> > +
> > +    /*
> > +     * Fixups are only applied for MMIO holes, and rely on the hardware 
> > domain
> > +     * having identity mappings for non RAM regions (gfn == mfn).
> > +     */
> > +    if ( !iomem_access_permitted(currd, gfn, gfn) ||
> > +         !is_memory_hole(_mfn(gfn), _mfn(gfn)) )
> > +        return -EPERM;
> > +
> > +    mfn = get_gfn(currd, gfn, &type);
> > +    if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) )
> > +        rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST;
> 
> I understand this is to cover the case where two vCPU-s access the same GFN
> at about the same time. However, the "success" log message at the call site
> being debug-only means we may be silently hiding bugs in release builds, if
> e.g. we get here despite the GFN having had an identity mapping already for
> ages.
Possibly, but what would be your suggestion to fix this?  I will think
about it, but I can't immediately see a solution that's not simply to
make the message printed by the caller to be gprintk() instead of
gdprintk() so catch such bugs.  Would you agree to that?
Thanks, Roger.
 
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |