[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 05/12] xen: introduce reserve_heap_pages


On 12/05/2020 02:10, Stefano Stabellini wrote:
On Thu, 30 Apr 2020, Julien Grall wrote:
On 30/04/2020 18:00, Stefano Stabellini wrote:
On Thu, 30 Apr 2020, Julien Grall wrote:
+    pg = maddr_to_page(start);
+    node = phys_to_nid(start);
+    zone = page_to_zone(pg);
+    page_list_del(pg, &heap(node, zone, order));
+    __alloc_heap_pages(pg, order, memflags, d);

I agree with Julien in not seeing how this can be safe / correct.

I haven't seen any issues so far in my testing -- I imagine it is
because there aren't many memory allocations after setup_mm() and before
create_domUs()  (which on ARM is called just before
domain_unpause_by_systemcontroller at the end of start_xen.)

I am not sure why you exclude setup_mm(). Any memory allocated (boot
allocator, xenheap) can clash with your regions. The main memory
are for the frametable and dom0. I would say you were lucky to not hit

Maybe it is because Xen typically allocates memory top-down? So if I
chose a high range then I would see a failure? But I have been mostly
testing with ranges close to the begin of RAM (as opposed to
ranges close to the end of RAM.)

I haven't looked at the details of the implementation, but you can try to
specify dom0 addresses for your domU. You should see a failure.

I managed to reproduce a failure by choosing the top address range. On
Xilinx ZynqMP the memory is:

   reg = <0x0 0x0 0x0 0x7ff00000 0x8 0x0 0x0 0x80000000>;

And I chose:

   fdt set /chosen/domU0 direct-map <0x0 0x10000000 0x10000000 0x8 0x70000000 

Resulting in:

(XEN) *** LOADING DOMU cpus=1 memory=80000KB ***
(XEN) Loading d1 kernel from boot module @ 0000000007200000
(XEN) Loading ramdisk from boot module @ 0000000008200000
(XEN) direct_map start=0x00000010000000 size=0x00000010000000
(XEN) direct_map start=0x00000870000000 size=0x00000010000000
(XEN) Data Abort Trap. Syndrome=0x5
(XEN) Walking Hypervisor VA 0x2403480018 on CPU0 via TTBR 0x0000000000f05000
(XEN) 0TH[0x0] = 0x0000000000f08f7f
(XEN) 1ST[0x90] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort


(XEN) Xen call trace:
(XEN)    [<000000000021a65c>] page_alloc.c#alloc_pages_from_buddy+0x15c/0x5d0 
(XEN)    [<000000000021b43c>] reserve_domheap_pages+0xc4/0x148 (LR)

This isn't what I was expecting. If there is any failure, I would expect an error message not a data abort. However...

Anything other than the very top of memory works.

... I am very confused by this. Are you suggesting that with your series you can allocate the same range for Dom0 and a DomU without any trouble?

- in construct_domU, add the range to xenheap and reserve it with

I am afraid you can't give the regions to the allocator and then allocate
them. The allocator is free to use any page for its own purpose or exclude

AFAICT, the allocator doesn't have a list of page in use. It only keeps
of free pages. So we can make the content of struct page_info to look like
was allocated by the allocator.

We would need to be careful when giving a page back to allocator as the
would need to be initialized (see [1]). This may not be a concern for
as the domain may never be destroyed but will be for correctness PoV.

For LiveUpdate, the original Xen will carve out space to use by the boot
allocator in the new Xen. But I think this is not necessary in your

It should be sufficient to exclude the page from the boot allocators (as
we do
for other modules).

One potential issue that can arise is there is no easy way today to
differentiate between pages allocated and pages not yet initialized. To
the code robust, we need to prevent a page to be used in two places. So
LiveUpdate we are marking them with a special value, this is used
to check we are effictively using a reserved page.

I hope this helps.

Thanks for writing all of this down but I haven't understood some of it.

For the sake of this discussion let's say that we managed to "reserve"
the range early enough like we do for other modules, as you wrote.

At the point where we want to call reserve_heap_pages() we would call
init_heap_pages() just before it. We are still relatively early at boot
so there aren't any concurrent memory operations. Why this doesn't work?

Because init_heap_pages() may exclude some pages (for instance MFN 0 is carved
out) or use pages for its internal structure (see init_node_heap()). So you
can't expect to be able to allocate the exact same region by

But it can't possibly use of any of pages it is trying to add to the
heap, right?
Yes it can, there are already multiple examples in the buddy allocator.

We have reserved a certain range, we tell init_heap_pages to add the
range to the heap, init_node_heap gets called and it ends up calling
xmalloc. There is no way xmalloc can use any memory from that
particular range because it is not in the heap yet. That should be safe.

If you look carefully at the code, you will notice:

    else if ( *use_tail && nr >= needed &&
              arch_mfn_in_directmap(mfn + nr) &&
              (!xenheap_bits ||
               !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
        _heap[node] = mfn_to_virt(mfn + nr - needed);
        avail[node] = mfn_to_virt(mfn + nr - 1) +
                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;

This is one of the condition where the allocator will use a few pages from the region for itself.

The init_node_heap code is a bit hard to follow but I went through it
and couldn't spot anything that could cause any issues (MFN 0 aside
which is a bit special). Am I missing something?
Aside what I wrote above, as soon as you give a page to an allocator, you waive a right to decide what the page is used for. The allocator is free to use the page for bookeeping or even carve out the page because it can't deal with it.

So I don't really see how giving a region to the allocator and then expecting the same region a call after is ever going to be safe.


Julien Grall



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.