[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] xen/arm: stack check instrumentation


  • To: Julien Grall <julien@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
  • Date: Tue, 30 Jul 2024 16:12:07 -0400
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=xen.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=fc33GMnABJISCtJcK0V6+XTh3zwvOnqOkgapOFYlwIo=; b=J02dY8lxHq0CiqDlX9V0wlhNqVpGb6+dyeXbEbZ5E4kZoE4joguzyua67FgtCLkzXWlWMCl7/ulw2SIp2s5XX+ht54X6zqQTYweajJZqjhee0qBEkmumBhm1W87a7D3OFSGYVrEplRhl/7UBk3ieJbi2YrI1S9j9wBsuhng+gzjpiF1IWD5G4qaRR4NFEsc1xU4BL6tozOqb268Bw8L4HFba9pzI/ww3liZ3nWz6TCYOabc7cmmJnJkuxG8QEpay6kixeiaU23c4dMujdS5gtBkA1wDFH4ktiWvsJkhe4S/ejl5eY/wj08G3dtVW8gB6c7ZRUgjNhuEFqbwRyfUlBA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=R97Rpde0bUQylQbPiBJDWnNxXGjp9ul8tvSIG4xqKOyjJRyqT70ixG2/VVCbf0SJa9O/kORveiUDT0rOPQc3xAREZsfELOQfAiLTK71wz9plRpy72qxBv7v2lVjSFYCTQNU0qoM5rsYMvyin3ik3N6Xiy9CJkEJbBfkGMVpMGP/c75GwyKdQCMrA+udZyJe1xuE7b/Ep9xZuv7755mi3OuCcRj6605k7QQtvlQRJjRhBlRL8z0fHheVDmG0pCeW1SqUrtD1BVxA6537xMraqji6FKQg1LTqpMUcxUol/sHwxVEHc6n102bmyiL9ZVn3UDBzG/GcSSAi8ZOYAFygcag==
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, "Volodymyr Babchuk" <Volodymyr_Babchuk@xxxxxxxx>
  • Delivery-date: Tue, 30 Jul 2024 20:12:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 7/30/24 14:07, Julien Grall wrote:
> Hi,
> 
> On 30/07/2024 18:50, Stewart Hildebrand wrote:
>> On 7/29/24 14:36, Julien Grall wrote:
>>> Hi Stewart,
>>>
>>> On 29/07/2024 15:24, Stewart Hildebrand wrote:
>>>> +DEFINE_PER_CPU(unsigned int, stack_check_nesting);
>>>> +DEFINE_PER_CPU(unsigned char *, stack_base);
>>>
>>> I think this could be "const unsigned char *" as the stack should not be 
>>> modified directly.
>>
>> Every time there's a vcpu context switch we will have a new stack.
> 
> I am not sure I follow. "const unsigned char *" should still allow you to 
> update stack_base. It will just prevent anyone to try to write to modify the 
> stack via stack_base.

Ah, of course. You're right. I'll change it to "const unsigned char *".

> 
>>
>>>
>>>> +
>>>> +void __attribute__((no_instrument_function)) stack_set(unsigned char 
>>>> *base)
>>>> +{
>>>> +    this_cpu(stack_base) = base;
>>>> +}
>>>> +
>>>> +void __init __attribute__((no_instrument_function)) stack_check_init(void)
>>>> +{
>>>> +    this_cpu(stack_check_nesting) = 0;
>>>> +    stack_set(init_data.stack);
>>>> +}
>>>> +
>>>> +__attribute__((no_instrument_function))
>>>> +void __cyg_profile_func_enter(void *this_fn, void *call_site)
>>>> +{
>>>> +    unsigned char *sp;
>>>> +
>>>> +    if ( get_per_cpu_offset() == INVALID_PER_CPU_OFFSET )
>>>> +        return;
>>>> +
>>>> +    asm volatile ("mov %0, sp" : "=r" (sp) );
>>>> +
>>>> +    if ( sp < this_cpu(stack_base) ||
>>>> +         sp > (this_cpu(stack_base) + STACK_SIZE) )
>>>
>>> The top of the stack is used to store struct cpu_info. So you want to 
>>> substract its size (see arch_vcpu_create()).
>>
>> Will do.
>>
>>>
>>>> +    {
>>>> +        if ( this_cpu(stack_check_nesting) )
>>>> +            return;
>>>> +
>>>> +        this_cpu(stack_check_nesting)++;
>>>
>>> Looking at the use, it only seems to be used as "print/panic once". So 
>>> maybe use a bool to avoid any overflow.
>>
>> It will only ever be incremented once. I'll still change it to a bool,
>> this should make it more obvious.
>>
>>>
>>>> +        printk("CPU %d stack pointer out of bounds (sp %#lx, stack base 
>>>> %#lx)\n",
>>>> +               smp_processor_id(), (uintptr_t)sp,
>>>> +               (uintptr_t)this_cpu(stack_base));
>>>> +        BUG();
>>>
>>> I would consider to call panic().
>>
>> panic() alone doesn't show the stack trace / call trace.
> 
> Ah good point. But TBH, I have never really understood why panic() didn't 
> return a call stack. There are a few places where I found beneficial when 
> debugging.
> 
> Anyway, I guess this could be handled separately.
> 
>>
>>> But is it safe to call any of this if we blew the stack?
>>
>> Nope, it sure isn't!
>>
>>> IOW, should we have a buffer?
>>
>> Yes. After some experimentation, I found that this printk and a WARN
>> (similar to BUG, but resumes execution and allows me to collect these
>> metrics) uses approximately 1632 bytes of stack. Assuming BUG uses a
>> similar amount of stack as WARN, and adding in a comfortable margin for
>> error, I'll add a 4096 byte buffer (i.e. invoke the print/BUG with 4096
>> bytes remaining on the stack).
> 
> AFAICT, the stack on Arm is 32KB. So we 1/8 of the stack as a buffer. Do you 
> know the current stack use in a normal setup (e.g. boot a guest)?

In my particular test case simply booting a dom0, it uses about 14k of
stack. Of course this could vary with booting dom0less domUs, complexity
of device tree parsing, etc.

> 
> Anyway, so long the feature is not enabled in production, then it might be ok 
> to steal 4KB. We could increase the stack if we see any issue.
> 
> Cheers,
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.