[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 15/30] xen/x86: populate PVHv2 Dom0 physical memory map



>>> On 04.10.16 at 11:12, <roger.pau@xxxxxxxxxx> wrote:
> On Fri, Sep 30, 2016 at 09:52:56AM -0600, Jan Beulich wrote:
>> >>> On 27.09.16 at 17:57, <roger.pau@xxxxxxxxxx> wrote:
>> > @@ -43,6 +44,11 @@ static long __initdata dom0_nrpages;
>> >  static long __initdata dom0_min_nrpages;
>> >  static long __initdata dom0_max_nrpages = LONG_MAX;
>> >  
>> > +/* Size of the VM86 TSS for virtual 8086 mode to use. */
>> > +#define HVM_VM86_TSS_SIZE   128
>> > +
>> > +static unsigned int __initdata hvm_mem_stats[MAX_ORDER + 1];
>> 
>> This is for your debugging only I suppose?
> 
> Probably, I wasn't sure if it was relevant so I left it here. Would it make 
> sense to only print this for debug builds of the hypervisor? Or better to 
> just remove it?

Statistics in debug builds often aren't really meaningful, so my order
of preference would be to remove it, then to keep it for all cases but
default it to be silent in non-debug builds (with a command line option
to enable).

>> > @@ -336,7 +343,8 @@ static unsigned long __init compute_dom0_nr_pages(
>> >          avail -= dom0_paging_pages(d, nr_pages);
>> >      }
>> >  
>> > -    if ( (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
>> > +    if ( is_pv_domain(d) &&
>> > +         (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
>> 
>> Perhaps better to simply force parms->p2m_base to UNSET_ADDR
>> earlier on?
> 
> p2m_base is already unconditionally set to UNSET_ADDR for PVHv2 domains, 
> hence the added is_pv_domain check in order to make sure that PVHv2 guests 
> don't get into that branch, which AFAICT is only relevant to PV guests.

This reads contradictory: If it's set to UNSET_ADDR, why the extra
check?

>> > @@ -1657,15 +1679,238 @@ out:
>> >      return rc;
>> >  }
>> >  
>> > +/* Populate an HVM memory range using the biggest possible order. */
>> > +static void __init hvm_populate_memory_range(struct domain *d, uint64_t 
>> > start,
>> > +                                             uint64_t size)
>> > +{
>> > +    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
>> > +    unsigned int order;
>> > +    struct page_info *page;
>> > +    int rc;
>> > +
>> > +    ASSERT(IS_ALIGNED(size, PAGE_SIZE) && IS_ALIGNED(start, PAGE_SIZE));
>> > +
>> > +    order = MAX_ORDER;
>> > +    while ( size != 0 )
>> > +    {
>> > +        order = min(get_order_from_bytes_floor(size), order);
>> > +        page = alloc_domheap_pages(d, order, memflags);
>> > +        if ( page == NULL )
>> > +        {
>> > +            if ( order == 0 && memflags )
>> > +            {
>> > +                /* Try again without any memflags. */
>> > +                memflags = 0;
>> > +                order = MAX_ORDER;
>> > +                continue;
>> > +            }
>> > +            if ( order == 0 )
>> > +                panic("Unable to allocate memory with order 0!\n");
>> > +            order--;
>> > +            continue;
>> > +        }
>> 
>> Is it not possible to utilize alloc_chunk() here?
> 
> Not really, alloc_chunk will do a ceil calculation of the number of needed 
> pages, which means we end up allocating bigger chunks than needed and then 
> free them. I prefer this approach, since we always request the exact memory 
> that's required, so there's no need to free leftover pages.

Hmm, in that case at least some of the shared logic would be nice to
get abstracted out.

>> > +        if ( (size & 0xffffffff) == 0 )
>> > +            process_pending_softirqs();
>> 
>> That's 4Gb at a time - isn't that a little too much?
> 
> Hm, it's the same that's used in pvh_add_mem_mapping AFAICT. I could reduce 
> it to 0xfffffff, but I'm also wondering if it makes sense to just call it on 
> each iteration, since we are possibly mapping regions with big orders here.

Iteration could is all that matters here really; the size of the mapping
wouldn't normally (as long as its one of the hardware supported sizes).
Doing the check on every iteration may be a little much (you may want
to check whether there's noticeable extra overhead), but doing the
check on like every 64 iterations may limit overhead enough to be
acceptable without making this more sophisticated.

>> > +static int __init hvm_setup_p2m(struct domain *d)
>> > +{
>> > +    struct vcpu *saved_current, *v = d->vcpu[0];
>> > +    unsigned long nr_pages;
>> > +    int i, rc, preempted;
>> > +
>> > +    printk("** Preparing memory map **\n");
>> 
>> Debugging leftover again?
> 
> Should this be merged with the next label, so that it reads as "Preparing 
> and populating the memory map"?

No, I'd rather see the other(s) gone too.

>> > +    /*
>> > +     * Subtract one page for the EPT identity page table and two pages
>> > +     * for the MADT replacement.
>> > +     */
>> > +    nr_pages = compute_dom0_nr_pages(d, NULL, 0) - 3;
>> 
>> How do you know the MADT replacement requires two pages? Isn't
>> that CPU-count dependent? And doesn't the partial page used for
>> the TSS also need accounting for here?
> 
> Yes, it's CPU-count dependent. This is just an approximation, since we only 
> support up to 256 CPUs on HVM guests, and each Processor Local APIC entry is 
> 
> 8 bytes, this means that the CPU-related data is going to use up to 2048 
> bytes of data, which still leaves plenty of space for the IO APIC and the 
> Interrupt Source Override entries. We requests two pages in case the 
> original MADT crosses a page boundary. FWIW, I could also fetch the original 
> MADT size earlier and use that as the upper bound here for memory 
> reservation.

That wouldn't help in case someone wants more vCPU-s than there
are pCPU-s. And baking in another assumption of there being <=
128 vCPU-s when there's already work being done to eliminate that
limit is likely not too good an idea.

> In the RFC series we also spoke about placing the MADT in a different 
> position that the native one, which would mean that we would end up stealing 
> some space from a RAM region in order to place it, so that we wouldn't have 
> to do this accounting.

Putting the new MADT at the same address as the old one won't work
anyway, again because possibly vCPU-s > pCPU-s.

>> > +    if ( cpu_has_vmx && paging_mode_hap(d) && !vmx_unrestricted_guest(v) )
>> > +    {
>> > +        /*
>> > +         * Since Dom0 cannot be migrated, we will only setup the
>> > +         * unrestricted guest helpers if they are needed by the current
>> > +         * hardware we are running on.
>> > +         */
>> > +        rc = hvm_setup_vmx_unrestricted_guest(d);
>> 
>> Calling a function of this name inside an if() checking for
>> !vmx_unrestricted_guest() is, well, odd.
> 
> Yes, that's right. What about calling it hvm_setup_vmx_realmode_helpers?

Sounds quite a bit more reasonable.

>> > +    process_pending_softirqs();
>> 
>> Why, outside of any loop?
> 
> It's the same that's done in construct_dom0_pv, so I though that it was a 
> good idea to drain any pending softirqs before starting domain build.

Perhaps in that case it should be pulled out of there into the
wrapper?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.