[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] x86/mm: Short circuit damage from "fishy" ref/typecount failure



On 29.01.2021 17:17, Andrew Cooper wrote:
> On 29/01/2021 11:29, Jan Beulich wrote:
>> On 25.01.2021 18:59, Andrew Cooper wrote:
>>> On 20/01/2021 08:06, Jan Beulich wrote:
>>>> Also, as far as "impossible" here goes - the constructs all
>>>> anyway exist only to deal with what we consider impossible.
>>>> The question therefore really is of almost exclusively
>>>> theoretical nature, and hence something like a counter
>>>> possibly overflowing imo needs to be accounted for as
>>>> theoretically possible, albeit impossible with today's
>>>> computers and realistic timing assumptions. If a counter
>>>> overflow occurred, it definitely wouldn't be because of a
>>>> bug in Xen, but because of abnormal behavior elsewhere.
>>>> Hence I remain unconvinced it is appropriate to deal with
>>>> the situation by BUG().
>>> I'm not sure how to be any clearer.
>>>
>>> I am literally not changing the current behaviour.  Xen *will* hit a
>>> BUG() if any of these domain_crash() paths are taken.
>>>
>>> If you do not believe me, then please go and actually check what happens
>>> when simulating a ref-acquisition failure.
>> So I've now also played the same game on the ioreq path (see
>> debugging patch below, and again with some non-"//temp"
>> changes actually improving overall behavior in that "impossible"
>> case). No BUG()s hit, no leaks (thanks to the extra changes),
>> no other anomalies observed.
>>
>> Hence I'm afraid it is now really up to you to point out the
>> specific BUG()s (and additional context as necessary) that you
>> either believe could be hit, or that you have observed being hit.
> 
> The refcounting logic was taken verbatim from ioreq, with the only
> difference being an order greater than 0.  The logic is also identical
> to the vlapic logic.
> 
> And the reason *why* it bugs is obvious - the cleanup logic
> unconditionally put()'s refs it never took to begin with, and hits
> underflow bugchecks.

In current staging, neither vmx_alloc_vlapic_mapping() nor
hvm_alloc_ioreq_mfn() put any refs they couldn't get. Hence
my failed attempt to repro your claimed misbehavior.

> For specifics, a simulated regular ref failure:
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 1051d86a20..314d258e31 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -171,9 +171,14 @@ static int vmtrace_alloc_buffer(struct vcpu *v)
>      v->vmtrace.buf = pg;
>  
>      for ( i = 0; i < d->vmtrace_frames; i++ )
> +    {
> +        if ( i == 0 )
> +            return -ENOMEM;
> +
>          /* Domain can't know about this page yet - something fishy
> going on. */
>          if ( !get_page_and_type(&pg[i], d, PGT_writable_page) )
>              BUG();
> +    }

No bad put here. This should have BUG() replaced just like
the other two instances mentioned above have it.

> and the simulated typeref failure:
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index db845ccc81..bd810157f4 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -172,8 +172,16 @@ static int vmtrace_alloc_buffer(struct vcpu *v)
>  
>      for ( i = 0; i < d->vmtrace_frames; i++ )
>      {
> +        get_page(&pg[i], d);
> +
>          if ( i == 0 )
> +        {
> +            put_page(&pg[i]);

This of course if wrong is the get_page() failed. Assuming it
succeeds I don't see what's wrong here.

>              return -ENOMEM;
> +        }
> +
> +        get_page_type(&pg[i], PGT_writable_page);
> +        continue;
>  
>          /* Domain can't know about this page yet - something fishy
> going on. */
>          if ( !get_page_and_type(&pg[i], d, PGT_writable_page) )
> 
> both yield:
> 
> (XEN) Xen BUG at /local/xen.git/xen/include/xen/mm.h:610
> (XEN) RIP:    e008:[<ffff82d04020423e>]
> common/domain.c#vmtrace_free_buffer+0x57/0xc1
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04020423e>] R
> common/domain.c#vmtrace_free_buffer+0x57/0xc1
> (XEN)    [<ffff82d040205497>] F vcpu_create+0x245/0x32b
> (XEN)    [<ffff82d04023ae5b>] F do_domctl+0xb48/0x1964
> (XEN)    [<ffff82d04030c6b2>] F pv_hypercall+0x2e4/0x53d
> (XEN)    [<ffff82d04039045d>] F lstar_enter+0x12d/0x140

So maybe vmtrace_free_buffer() is broken?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.