[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ASSERT in rangeset_remove_range



(+ Jan, Andrew, Wei for the common code)

On 08/11/2021 22:45, Stefano Stabellini wrote:
Hi Oleksandr, Julien,

Hi,

I discovered a bug caused by the recent changes to introduce extended
regions in make_hypervisor_node (more logs appended):


(XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
(XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
(XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
(XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
(XEN) Assertion 's <= e' failed at rangeset.c:189


When a bank of memory is zero in size, then rangeset_remove_range is
called with end < start, triggering an ASSERT in rangeset_remove_range.

One solution is to avoid creating 0 size banks. The following change
does that:


diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 49b4eb2b13..3efe542d0f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -459,9 +459,12 @@ static void __init allocate_memory(struct domain *d, 
struct kernel_info *kinfo)
          goto fail;
bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
-    if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
+    if ( bank_size > 0 )
+    {
+        if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
                                 bank_size) )
-        goto fail;
+            goto fail;
+    }

I would move the size check in allocate_bank_memory().

if ( kinfo->unassigned_mem )
          goto fail;



We have a couple of other options too:

- remove the ASSERT in rangeset_remove_range
There is an argument that we should simply return error
fromrangeset_remove_range, rather than a full assert.

To be honest, this is a developper mistake to call with end < start. If we were going to return an error then we would completely hide (even in developper) it because we would fallback to not exposing extended regions.

So I am not sure switch from ASSERT() to a plain check is a good idea. Jan, Andrew, Wei, what do you think?

That said, this option would not be sufficient to fix your problem as extended regions will not work.


- add a if (end > start) check before calling rangeset_remove_range
We could check that end > start before calling rangeset_remove_range at
all the call sites in domain_build.c. There are 5 call sites at the
moment.

I think we only want to add (end > start) where we expect the region size to be 0. AFAICT, the only other potential place where this can happens is ``find_memory_holes()`` (I vaguely recall a discussion in the past where some of the "reg" property would have size == 0).


Any other ideas or suggestions?

My preference goes with your initial sugestion (so long the check is moved to allocate_bank_memory()).

[...]

(XEN) Assertion 's <= e' failed at rangeset.c:189
(XEN) ----[ Xen-4.16-rc  arm64  debug=y  Not tainted ]----
(XEN) Xen call trace:
(XEN)    [<0000000000220e6c>] rangeset_remove_range+0xbc/0x2bc (PC)
(XEN)    [<00000000002cd508>] domain_build.c#make_hypervisor_node+0x258/0x7f4 
(LR)
(XEN)    [<00000000002cf2a8>] domain_build.c#construct_domU+0x9cc/0xa8c

Vanilla staging doesn't call make_hypervisor_node() from construct_domU. So what are you using?

(XEN)    [<00000000002d0440>] create_domUs+0xe8/0x224
(XEN)    [<00000000002d4988>] start_xen+0xafc/0xbf0
(XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 's <= e' failed at rangeset.c:189
(XEN) ****************************************


Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.