[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 7/8] pdx: introduce a new compression algorithm based on region offsets



On Tue, Jul 29, 2025 at 04:16:15PM +0200, Jan Beulich wrote:
> On 24.07.2025 13:04, Roger Pau Monne wrote:
> > --- a/xen/common/pdx.c
> > +++ b/xen/common/pdx.c
> > @@ -24,6 +24,7 @@
> >  #include <xen/param.h>
> >  #include <xen/pfn.h>
> >  #include <xen/sections.h>
> > +#include <xen/sort.h>
> >  
> >  /**
> >   * Maximum (non-inclusive) usable pdx. Must be
> > @@ -40,6 +41,12 @@ bool __mfn_valid(unsigned long mfn)
> >  
> >  #ifdef CONFIG_PDX_MASK_COMPRESSION
> >      invalid |= mfn & pfn_hole_mask;
> > +#elif defined(CONFIG_PDX_OFFSET_COMPRESSION)
> > +{
> > +    unsigned long base = pfn_bases[PFN_TBL_IDX(mfn)];
> > +
> > +    invalid |= mfn < base || mfn >= base + pdx_region_size;
> 
> Leveraging wrapping, this could be simplified to
> 
>     invalid |= mfn - base >= pdx_region_size;
> 
> I think. Considering it's a frequently used path, doing so may be worthwhile.

I don't think that would work for all cases, take the following
example:

PFN compression using lookup table shift 18 and region size 0x40000
 range   0 [0000000280000, 00000002bffff] PFN IDX  10 : 0000000280000

If you pass mfn 0 to mfn_valid() with your proposed adjustment, the
result of the subtraction would be:

0 - ~0UL == 1

Which wouldn't satisfy the >= condition, and hence pfn 0 would be
reported as a valid mfn.  I think we need to keep both sides of the
check.

> > @@ -75,6 +82,13 @@ void set_pdx_range(unsigned long smfn, unsigned long 
> > emfn)
> >  # error "Missing architecture maximum number of RAM ranges"
> >  #endif
> >  
> > +/* Some functions should be init when not using PDX mask compression. */
> > +#ifndef CONFIG_PDX_MASK_COMPRESSION
> > +# define __init_or_pdx_mask __init
> > +#else
> > +# define __init_or_pdx_mask
> > +#endif
> 
> Considering this is local to the CU, "pdx" in the name isn't very meaningful.
> Instead it being a compression type may be, so how about __init_or_mask_compr
> or some such? (If we gain further compression types, this may be getting
> unwieldy anyway.)

I'm fine with your proposal.  The current naming was derived from the
CONFIG_ define.

> > +static void __init cf_check swp_node(void *a, void *b, size_t size)
> > +{
> > +    SWAP(a, b);
> 
> This doesn't look right - you swap a and b, not what they point to.
> 
> > +static bool __init pfn_offset_sanitize_ranges(void)
> > +{
> > +    unsigned int i = 0;
> > +
> > +    if ( nr_ranges == 1 )
> > +    {
> > +        ASSERT(PFN_TBL_IDX(ranges[0].base) ==
> > +               PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1));
> 
> I think this points out a naming issue in patch 2: "base" and "size" look
> as if these were address / byte count, when they're PFN / page count.
> Which caught my eye because of the values being passed to something that
> actually wants a PFN (and hence looked wrong at the first glance).

The struct name is pfn_range, I could rename the fields to base_pfn
and pages or similar, but my impression was that the struct name was
enough of a pointer that those are PFN ranges.  Do you have any
alternative proposal about how to name those?

> > +        return true;
> > +    }
> > +
> > +    while ( i + 1 < nr_ranges )
> > +    {
> > +        /*
> > +         * Ensure ranges [start, end] use the same offset table index.  
> > Should
> > +         * be guaranteed by the logic that calculates the pfn shift.
> > +         */
> > +        if ( PFN_TBL_IDX(ranges[i].base) !=
> > +             PFN_TBL_IDX(ranges[i].base + ranges[i].size - 1) ||
> > +             PFN_TBL_IDX(ranges[i + 1].base) !=
> > +             PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
> 
> It feels a little odd to re-do the 2nd half of the checking here on the next
> iteration, when the table wouldn't have changed when ...
> 
> > +        {
> > +            ASSERT_UNREACHABLE();
> > +            return false;
> > +        }
> > +
> > +        if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 
> > 1].base) )
> > +        {
> > +            i++;
> > +            continue;
> 
> ... taking this path. Could I talk you into moving the latter half ...
> 
> > +        }
> 
> ... here? If that's needed at all, as ...
> 
> > +        /* Merge ranges with the same table index. */
> > +        ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
> > +                         ranges[i].base;
> 
> ... the new range should also fulfill the requirement. Just that the last
> such range then wouldn't be checked, unless ...
> 
> > +        memmove(&ranges[i + 1], &ranges[i + 2],
> > +                (nr_ranges - (i + 2)) * sizeof(ranges[0]));
> > +        nr_ranges--;
> > +    }
> 
> ... that checking was put past the loop. Which then would allow to remove
> the special casing of nr_ranges == 1 at the top of the function: You'd
> uniformly check the ranges[nr_ranges - 1] here.       

What about doing it like:

static bool __init pfn_offset_sanitize_ranges(void)
{
    unsigned int i = 0;

    if ( PFN_TBL_IDX(ranges[0].base) !=
         PFN_TBL_IDX(ranges[0].base + ranges[0].size - 1) )
    {
        ASSERT_UNREACHABLE();
        return false;
    }

    while ( i + 1 < nr_ranges )
    {
        /*
         * Ensure ranges [start, end] use the same offset table index.  Should
         * be guaranteed by the logic that calculates the pfn shift.
         */
        if ( PFN_TBL_IDX(ranges[i + 1].base) !=
             PFN_TBL_IDX(ranges[i + 1].base + ranges[i + 1].size - 1) )
        {
            ASSERT_UNREACHABLE();
            return false;
        }

        if ( PFN_TBL_IDX(ranges[i].base) != PFN_TBL_IDX(ranges[i + 1].base) )
        {
            i++;
            continue;
        }

        /* Merge ranges with the same table index. */
        ranges[i].size = ranges[i + 1].base + ranges[i + 1].size -
                         ranges[i].base;
        memmove(&ranges[i + 1], &ranges[i + 2],
                (nr_ranges - (i + 2)) * sizeof(ranges[0]));
        nr_ranges--;
    }

    return true;
}

I've pulled the index 0 check ahead of the loop, which then covers for
the case where nr_ranges == 1.  There's also no duplicate checking of
the ranges, since the range at i + 1 will always be a non-checked one;
either because the array has been shifted as a result of a range
merging, or the index has been advanced.

> > +        ranges[i].size = start + ranges[i].size - ranges[i].base;
> > +
> > +        /*
> > +         * Only merge overlapped regions now, leave adjacent regions 
> > separated.
> > +         * They would be merged later if both use the same index into the
> > +         * lookup table.
> > +         */
> > +        if ( !i || ranges[i].base >= (ranges[i - 1].base + ranges[i - 
> > 1].size) )
> > +        {
> > +            /*
> > +             * We might parse the region at position 0 more than once, as 
> > for
> > +             * index 0 we don't attempt to merge to keep this simple.
> > +             */
> > +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
> > +            continue;
> 
> In how far is it relevant here that slot 0 may be looked at more than once?

Not really, just that someone might wonder about the possibly
inefficiency of this.  I will remove the comment then.

> > +        }
> > +
> > +        ranges[i - 1].size = ranges[i].base + ranges[i].size -
> > +                             ranges[i - 1].base;
> > +
> > +        if ( i + 1 < nr_ranges )
> > +            memmove(&ranges[i], &ranges[i + 1],
> > +                    (nr_ranges - (i + 1)) * sizeof(ranges[0]));
> > +        else /* last range */
> > +            mask |= pdx_region_mask(ranges[i].base, ranges[i].size);
> > +        nr_ranges--;
> > +        i--;
> > +    }
> > +
> > +    /*
> > +     * Populate a mask with the non-equal bits of the different ranges, do 
> > this
> > +     * to calculate the maximum PFN shift to use as the lookup table index.
> > +     */
> > +    for ( i = 0; i < nr_ranges; i++ )
> > +        for ( unsigned int j = 0; j < nr_ranges; j++ )
> > +            idx_mask |= (ranges[i].base & ~mask) ^ (ranges[j].base & 
> > ~mask);
> > +
> > +    if ( !idx_mask )
> > +        /* Single region case. */
> > +        pfn_index_shift = flsl(mask);
> > +    else if ( flsl(idx_mask) - ffsl(idx_mask) < 
> > CONFIG_PDX_OFFSET_TBL_ORDER )
> > +        /* The changed mask fits in the table index width. */
> > +        pfn_index_shift = ffsl(idx_mask) - 1;
> > +    else
> > +        /* Changed mask is wider than array size, use most significant 
> > bits. */
> > +        pfn_index_shift = flsl(idx_mask) - CONFIG_PDX_OFFSET_TBL_ORDER;
> 
> Perhaps emit a log message here to indicate that bigger PDX_OFFSET_TBL_ORDER
> may yield better results?

What about:

printk(XENLOG_DEBUG
       "PFN compression table index truncated, requires order %u\n",
       flsl(idx_mask) - ffsl(idx_mask) + 1);

> > +    /*
> > +     * Ensure correctness of the calculated values, plus merge ranges if 
> > they
> > +     * use the same lookup table index.
> > +     */
> > +    if ( !pfn_offset_sanitize_ranges() )
> > +    {
> > +        printk(XENLOG_DEBUG "PFN compression is invalid, disabling\n");
> > +        pfn_pdx_compression_reset();
> > +        return false;
> > +    }
> > +
> > +    /*
> > +     * Find the maximum PFN range size after having merged ranges with same
> > +     * index.  The rounded up region size will be the base for the PDX 
> > region
> > +     * size and shift.
> > +     */
> > +    for ( i = 0; i < nr_ranges; i++ )
> > +        size = max(size, ranges[i].size);
> > +
> > +    /* pdx_init_mask() already takes MAX_ORDER into account. */
> > +    mask = PFN_DOWN(pdx_init_mask(size << PAGE_SHIFT));
> 
> We're in arch-neutral code here, so I think you need to cast size to paddr_t
> before shifting.

Indeed, size is unsigned long here.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.