[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot



On Mon, Feb 3, 2020 at 4:37 PM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
>
> On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> > Hi David,
> >
> > On 01/02/2020 00:33, David Woodhouse wrote:
> > > From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> >
> > I am a bit concerned with this change, particularly the consequence this
> > have for the page-tables. There is an assumption that intermediate
> > page-tables allocated via the boot allocator will never be freed.
> >
> > On x86, a call to vunmap() will not free page-tables, but a subsequent
> > call to vmap() may free it depending on the mapping size. So we would
> > call free_domheap_pages() rather than init_heap_pages().
> >
> > I am not entirely sure what is the full consequence, but I think this is
> > a call for investigation and write it down a summary in the commit message.
>
> This isn't just about page tables, right? It's about *any* allocation
> given out by the boot allocator, being freed with free_heap_pages() ?
>
> Given the amount of code that has conditionals in both alloc and free
> paths along the lines of…
>
>   if (system_state > SYS_STATE_boot)
>       use xenheap
>   else
>       use boot allocator
>
> … I'm not sure I'd really trust the assumption that such a thing never
> happens; that no pages are ever allocated from the boot allocator and
> then freed into the heap.
>
> In fact it does work fine except for some esoteric corner cases,
> because init_heap_pages() is mostly just a trivial loop over
> free_heap_pages().
>
> The corner cases are if you call free_heap_pages() on boot-allocated
> memory which matches one or more of the following criteria:
>
>  • Includes MFN #0,
>
>  • Includes the first page the heap has seen on a given node, so
>    init_node_heap() has to be called, or
>
>  • High-order allocations crossing from one node to another.

I was asked to forward a message relating to MFN 0 and allocations
crossing zones from a private discussion on the security list:

8<---

> I am having difficulty seeing how invalidating MFN0 would solve the issue 
> here.
> The zone number for a specific page is calculated from the most significant 
> bit
> position set in it's MFN. As a result, each successive zone contains an order 
> of
> magnitude more pages. You would need to invalidate the first or last MFN in 
> each
> zone.

Because (unless Jan and I are reading the code wrong):

* Chunks can only be merged such that they end up on order-boundaries.
* Chunks can only be merged if they are the same order.
* Zone boundaries are on order boundaries.

So say you're freeing mfn 0x100, and mfn 0xff is free.  In that loop, (1
<< order) & mfn will always be 0, so it will always only look "forward"
fro things to merge, not backwards.

Suppose on the other hand, that you're freeing mfn 0x101, and 0x98
through 0x100 are free.  The loop will look "backwards" and merge with
0x100; but then it will look "forwards" again.

Now suppose you've merged 0x100-0x1ff, and the order moves up to size
0x100.  Now the mask becomes 0x1ff; so it can't merge with 0x200-0x2ff
(which would cross zones); instead it looks backwards to 0x0-0xff.

We don't think it's possible for things to be merged across zones unless
it can (say) start at 0xff, and merge all the way back to 0x0; which
can't be done if 0x0 is never on the free list.

That's the idea anyway.  That would explain why we've never seen it on
x86 -- due to the way the architecture is, mfn 0 is never on the free list.

--->8

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.