[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CONFIG_SCRUB_DEBUG=y + arm64 + livepatch = Xen BUG at page_alloc.c:738



On 09/13/2017 11:32 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 12, 2017 at 09:19:23PM -0400, Boris Ostrovsky wrote:
>>
>> On 09/12/2017 08:01 PM, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Sep 11, 2017 at 08:45:02PM -0400, Boris Ostrovsky wrote:
>>>>
>>>> On 09/11/2017 07:55 PM, Konrad Rzeszutek Wilk wrote:
>>>>> Hey,
>>>>>
>>>>> I've only been able to reproduce this on ARM64 (trying right now ARM32
>>>>> as well), and not on x86.
>>>>>
>>>>> If I compile Xen without CONFIG_SCRUB_DEBUG it works great. But if
>>>>> enable it and try to load a livepatch it blows up in page_alloc.c:738
>>>>>
>>>>> This is with origin/staging (d0291f3391)
>>>> Can you still reproduce this if you revert 307c3be?
>>> Sadly yes - it still crashes. I didn't capture the serial output.
>>>
>>> I honestly think the issue is that on ARM64 the "sleep" loop does not
>>> wake up as often as on x86 (CC-ing Dariof who I believe observed this
>>> with Credit2 and the wakeup.. something) - maybe he remembers the
>>> details. Anyhow my theory is that the pages are not scrubbed at all
>>> when they go in the idle loop as once it goes to sleep - it stays there.
>>
>> There is no (well, should not be) any timing dependencies in how/whether
>> pages are scrubbed. If a page doesn't get scrubbed because someone didn't
>> wake up then it should be scrubbed in alloc_heap_pages(). So in this case
>> the page is thought to be clean (_PGC_need_scrub is not set), but it is not.
>>
>> Have you tried running a guest (or two), rebooting in a loop?
> No. I just cold-booted it and tried to livepatch.
>> Another thing to try is to set need_scrub to true in free_heap_pages().
> Magic!
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index dbad1e1ca0..9303eb4517 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1308,6 +1308,7 @@ static void free_heap_pages(
>      ASSERT(node >= 0);
>  
>      spin_lock(&heap_lock);
> +    need_scrub = true;
>  
>      for ( i = 0; i < (1 << order); i++ )
>      {
>
> Fixes it ! :-)


Well, that's not a fix. This eliminates the case that something in
ARM-specific code (which I haven't tested) accidentally clears
_PGC_need_scrub.

OK, I think I know what the problem is. You are using
CONFIG_SEPARATE_XENHEAP, are you?


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.