|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0
On 14.02.2025 10:29, Roger Pau Monne wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap.
>
> ### dom0
> = List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
> - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86)
> + cpuid-faulting=<bool>, msr-relaxed=<bool>,
> + pf-fixup=<bool> ] (x86)
>
> = List of [ sve=<integer> ] (Arm64)
>
> @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems.
>
> If using this option is necessary to fix an issue, please report a bug.
>
> +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and
> + defaults to false.
> +
> + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO
> + regions into the p2m, such mode relies on Xen dom0 builder populating
> + the p2m with all MMIO regions that dom0 should access. However Xen
> + doesn't have a complete picture of the host memory map, due to not
> + being able to process ACPI dynamic tables.
> +
> + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions
> + to the p2m in response to page-faults generated by dom0 trying to access
> + unpopulated entries in the p2m.
I wonder if this is to implementation focused for a command line option doc.
In particular the multiple uses of "p2m" are standing out in this regard.
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const
> char *e)
> opt_dom0_cpuid_faulting = val;
> else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 )
> opt_dom0_msr_relaxed = val;
> +#ifdef CONFIG_HVM
> + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 )
> + opt_dom0_pf_fixup = val;
> +#endif
> else
> return -EINVAL;
I fear the scope of these sub-options is getting increasingly confusing.
opt_dom0_msr_relaxed is what its name says - specific to Dom0.
opt_dom0_cpuid_faulting, otoh, is a control domain option (i.e. also
applicable to a [hypothetical?] late ctrldom). Now you add an option
that's applicable to the hardware domain, i.e. also coverting late-hwdom.
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -338,8 +338,38 @@ static int hvmemul_do_io(
> if ( !s )
> {
> if ( is_mmio && is_hardware_domain(currd) )
> - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size
> %u\n",
> - dir ? "read" : "write", addr, size);
> + {
> + /*
> + * PVH dom0 is likely missing MMIO mappings on the p2m, due
> to
> + * the incomplete information Xen has about the memory
> layout.
> + *
> + * Either print a message to note dom0 attempted to access an
> + * unpopulated GPA, or try to fixup the p2m by creating an
> + * identity mapping for the faulting GPA.
> + */
> + if ( opt_dom0_pf_fixup )
> + {
> + int inner_rc = hvm_hwdom_fixup_p2m(addr);
Why not use rc, as we do elsewhere in the function?
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -13,6 +13,7 @@
> #include <xen/lib.h>
> #include <xen/trace.h>
> #include <xen/sched.h>
> +#include <xen/iocap.h>
> #include <xen/irq.h>
> #include <xen/softirq.h>
> #include <xen/domain.h>
> @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst,
> struct domain *src)
> return rc;
> }
>
> +bool __ro_after_init opt_dom0_pf_fixup;
> +int hvm_hwdom_fixup_p2m(paddr_t addr)
The placement here looks odd to me. Why not as static function in emulate.c?
Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c?
> +{
> + unsigned long gfn = paddr_to_pfn(addr);
> + struct domain *currd = current->domain;
> + p2m_type_t type;
> + mfn_t mfn;
> + int rc;
> +
> + ASSERT(is_hardware_domain(currd));
> + ASSERT(!altp2m_active(currd));
> +
> + /*
> + * Fixups are only applied for MMIO holes, and rely on the hardware
> domain
> + * having identity mappings for non RAM regions (gfn == mfn).
> + */
> + if ( !iomem_access_permitted(currd, gfn, gfn) ||
> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) )
> + return -EPERM;
> +
> + mfn = get_gfn(currd, gfn, &type);
> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) )
> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST;
I understand this is to cover the case where two vCPU-s access the same GFN
at about the same time. However, the "success" log message at the call site
being debug-only means we may be silently hiding bugs in release builds, if
e.g. we get here despite the GFN having had an identity mapping already for
ages.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |