[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 3 Mar 2020 11:29:09 +0100
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=roger.pau@xxxxxxxxxx; spf=Pass smtp.mailfrom=roger.pau@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Tue, 03 Mar 2020 10:29:32 +0000
  • Ironport-sdr: P4z52YXRbKordU4wZPZeR0MuJAfTVPUCgsVyzQv40BKLpnNUGg73x3OoFAssrZuFnimCg/m+1w CBwYXinhkU6Zu4gH5n7SzDAUH4P1x6hs/fYnWNZ9TvbqGXsnNO1F4Z8y0KF2rcksm6FNBmLGbJ jmwzdeDc6uwf+g7vrrSOrVjSnTopatZLNFhUga5d/6iOAjoyOzTc2nUTdUlONzrJ9sr4KETmTT Gt5wC4mUeWNvudxty7W2b13no7/LmZ+rIenHJgISsMk8XOkZxTugSzq2zt6NwT49KFWgZ8nRYB voc=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote:
> On 02.03.2020 16:55, Roger Pau Monne wrote:
> > Don't assume there's going to be enough space at the tail of the
> > loaded kernel and instead try to find a suitable memory area where the
> > initrd and metadata can be loaded.
> > 
> > Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> >  xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++-
> >  1 file changed, 50 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> > index eded87eaf5..a03bf2e663 100644
> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d)
> >  #undef MB1_PAGES
> >  }
> >  
> > +static paddr_t find_memory(const struct domain *d, const struct elf_binary 
> > *elf,
> > +                           size_t size)
> > +{
> > +    paddr_t kernel_start = (paddr_t)elf->dest_base;
> > +    paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size);
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < d->arch.nr_e820; i++ )
> > +    {
> > +        paddr_t start, end;
> > +
> > +        if ( d->arch.e820[i].addr < MB(1) &&
> > +             d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) )
> 
> I guess you mean <= here,

Hm, right, or else I would have to - 1.

> and I also guess the left side of the
> && could be dropped altogether (as redundant - E820 entries
> aren't supposed to wrap, or if they did we'd have bigger
> problems). Also perhaps ...
> 
> > +            continue;
> > +
> > +        start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
> > +        end = d->arch.e820[i].addr + d->arch.e820[i].size;
> 
> ... calculate "end" earlier and use it in the if() above?

Right.

> 
> As to the aligning to a 1Mb boundary - why are you doing this?

I'm not sure placing the initrd and metadata below 1MB is sensible,
even if a region big enough was found. In pvh_populate_p2m we copy the
data on the memory regions < 1MB, I'm not sure BDA/EBDA is marked as
reserved in the memory map always.

> I guess whatever the reason warrants a comment, the more that
> further down you only align to page boundaries.
> 
> > +        /* Deal with the kernel being loaded in the region. */
> > +        if ( kernel_start <= start && kernel_end >= start )
> 
> While it doesn't matter much, I think it would look better to
> use > on the right side of the && here ...
> 
> > +            /* Truncate the start of the region */
> > +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> > +        else if ( kernel_start <= end && kernel_end >= end )
> 
> and < on the left side of the && here. Furthermore - can this
> really be "else if()" here, i.e. doesn't it need to be plain
> "if()"?

I don't think so, as the region where the kernel has been loaded must
always be a single RAM region. Ie: [kernel_start, kernel_end) is
always going to be a subset of a RAM region.

So either the kernel region doesn't overlap, or overlaps with the
head, tail or splits the region into two.

> > +            /* Truncate the end of the region */
> > +            end = kernel_start;
> > +        /* Pick the biggest of the split regions */
> > +        else if ( kernel_start - start > end - kernel_end )
> 
> I don't think the logic above guarantees e.g. kernel_start > start
> (i.e. the subtraction to not wrap). More generally I don't think
> it follows that there are two split regions at this point. At the
> very least I think this whole block wants to be wrapped in a
> check that there's any overlap between kernel and the given region
> in the first place.

Yes, that's indeed missing.

> 
> > +            end = kernel_start;
> > +        else
> > +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> > +
> > +        if ( end - start >= size )
> > +            return start;
> 
> Are all blocks E820_RAM at this point in time? Otherwise there's
> a type check missing. (Even if all are of this type now, adding
> a type check may still be a good idea to be more future proof.)

Will add the check.

> > @@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, 
> > const module_t *image,
> >          return rc;
> >      }
> >  
> > -    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
> > +    last_addr = find_memory(d, &elf, sizeof(start_info) +
> > +                            initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
> > +                                     sizeof(mod)
> > +                                   : 0 +
> > +                            cmdline ? ROUNDUP(strlen(cmdline) + 1,
> > +                                              elf_64bit(&elf) ? 8 : 4)
> > +                                    : 0);
> 
> I guess you mean
> 
>     last_addr = find_memory(d, &elf, sizeof(start_info) +
>                             (initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
>                                       sizeof(mod)
>                                     : 0) +
>                             (cmdline ? ROUNDUP(strlen(cmdline) + 1,
>                                                elf_64bit(&elf) ? 8 : 4)
>                                      : 0));
> 
> ?

Uh, yes, sorry.

> 
> Also, do both regions need to be adjacent? If not, wouldn't it be
> better to find slots for them one by one?

That's going to be much more complicated, as you would have to account
for previous regions while searching for empty spaces. If we want to
go that route we would have to use a rangeset or some such in order to
keep track of used areas.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.