[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 08/16] xen/domain: Add vmtrace_size domain creation parameter



On 01.02.2021 15:22, Andrew Cooper wrote:
> On 01/02/2021 13:18, Jan Beulich wrote:
>> On 30.01.2021 03:58, Andrew Cooper wrote:
>>> +static int vmtrace_alloc_buffer(struct vcpu *v)
>>> +{
>>> +    struct domain *d = v->domain;
>>> +    struct page_info *pg;
>>> +    unsigned int i;
>>> +
>>> +    if ( !d->vmtrace_size )
>>> +        return 0;
>>> +
>>> +    pg = alloc_domheap_pages(d, get_order_from_bytes(d->vmtrace_size),
>>> +                             MEMF_no_refcount);
>>> +    if ( !pg )
>>> +        return -ENOMEM;
>>> +
>>> +    /*
>>> +     * Getting the reference counting correct here is hard.
>>> +     *
>>> +     * All pages are now on the domlist.  They, or subranges within, will 
>>> be
>> "domlist" is too imprecise, as there's no list with this name. It's
>> extra_page_list in this case (see also below).
>>
>>> +     * freed when their reference count drops to zero, which may any time
>>> +     * between now and the domain teardown path.
>>> +     */
>>> +
>>> +    for ( i = 0; i < (d->vmtrace_size >> PAGE_SHIFT); i++ )
>>> +        if ( unlikely(!get_page_and_type(&pg[i], d, PGT_writable_page)) )
>>> +            goto refcnt_err;
>>> +
>>> +    /*
>>> +     * We must only let vmtrace_free_buffer() take any action in the 
>>> success
>>> +     * case when we've taken all the refs it intends to drop.
>>> +     */
>>> +    v->vmtrace.pg = pg;
>>> +
>>> +    return 0;
>>> +
>>> + refcnt_err:
>>> +    /*
>>> +     * In the failure case, we must drop all the acquired typerefs thus 
>>> far,
>>> +     * skip vmtrace_free_buffer(), and leave domain_relinquish_resources() 
>>> to
>>> +     * drop the alloc refs on any remaining pages - some pages could 
>>> already
>>> +     * have been freed behind our backs.
>>> +     */
>>> +    while ( i-- )
>>> +        put_page_and_type(&pg[i]);
>>> +
>>> +    return -ENODATA;
>>> +}
>> As said in reply on the other thread, PGC_extra pages don't get
>> freed automatically. I too initially thought they would, but
>> (re-)learned otherwise when trying to repro your claims on that
>> other thread. For all pages you've managed to get the writable
>> ref, freeing is easily done by prefixing the loop body above by
>> put_page_alloc_ref(). For all other pages best you can do (I
>> think; see the debugging patches I had sent on that other
>> thread) is to try get_page() - if it succeeds, calling
>> put_page_alloc_ref() is allowed. Otherwise we can only leak the
>> respective page (unless going to further extents with trying to
>> recover from the "impossible"), or assume the failure here was
>> because it did get freed already.
> 
> Right - I'm going to insist on breaking apart orthogonal issues.
> 
> This refcounting issue isn't introduced by this series - this series
> uses an established pattern, in which we've found a corner case.
> 
> The corner case is theoretical, not practical - it is not possible for a
> malicious PV domain to take 2^43 refs on any of the pages in this
> allocation.  Doing so would require an hours-long SMI, or equivalent,
> and even then all malicious activity would be paused after 1s for the
> time calibration rendezvous which would livelock the system until the
> watchdog kicked in.

Actually an overflow is only one of the possible reasons here.
Another, which may be more "practical", is that another entity
has already managed to free the page (by dropping its alloc-ref,
and of course implying it did guess at the MFN).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.