[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ubuntu 16.04.1 LTS kernel 4.4.0-57 over-allocation and xen-access fail



On 10/01/17 15:02, Razvan Cojocaru wrote:
> On 01/10/2017 04:13 PM, Andrew Cooper wrote:
>> On 10/01/17 09:06, Razvan Cojocaru wrote:
>>> On 01/09/2017 02:54 PM, Andrew Cooper wrote:
>>>> On 09/01/17 11:36, Razvan Cojocaru wrote:
>>>>> Hello,
>>>>>
>>>>> We've come across a weird phenomenon: an Ubuntu 16.04.1 LTS HVM guest
>>>>> running kernel 4.4.0 installed via XenCenter in XenServer Dundee seems
>>>>> to eat up all the RAM it can:
>>>>>
>>>>> (XEN) [  394.379760] d1v1 Over-allocation for domain 1: 524545 > 524544
>>>>>
>>>>> This leads to a problem with xen-access, specifically libxc which does
>>>>> this in xc_vm_event_enable() (this is Xen 4.6):
>>>>>
>>>>> ring_page = xc_map_foreign_batch(xch, domain_id, PROT_READ | PROT_WRITE,
>>>>>                                  &mmap_pfn, 1);
>>>>>
>>>>> if ( mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
>>>>> {
>>>>>     /* Map failed, populate ring page */
>>>>>     rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
>>>>>                                                &ring_pfn);
>>>>>     if ( rc1 != 0 )
>>>>>     {
>>>>>         PERROR("Failed to populate ring pfn\n");
>>>>>         goto out;
>>>>>     }
>>>>>
>>>>> The first time everything works fine, xen-access can map the ring page.
>>>>> But most of the time the second time fails in the
>>>>> xc_domain_populate_physmap_exact() call, and again this is dumped in the
>>>>> Xen log (once for each failed attempt):
>>>>>
>>>>> (XEN) [  395.952188] d0v3 Over-allocation for domain 1: 524545 > 524544
>>>> Thinking further about this, what happens if you avoid removing the page
>>>> on exit?
>>>>
>>>> The first populate succeeds, and if you leave the page populated, the
>>>> second time you come around the loop, it should not be of type XTAB, and
>>>> the map should succeed.
>>> Sorry for the late reply, had to put out another fire yesterday.
>>>
>>> I've taken your recommendation to roughly mean this:
>>>
>>> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
>>> index ba9690a..805564b 100644
>>> --- a/xen/common/vm_event.c
>>> +++ b/xen/common/vm_event.c
>>> @@ -100,8 +100,11 @@ static int vm_event_enable(
>>>      return 0;
>>>
>>>   err:
>>> +    /*
>>>      destroy_ring_for_helper(&ved->ring_page,
>>>                              ved->ring_pg_struct);
>>> +    */
>>> +    ved->ring_page = NULL;
>>>      vm_event_ring_unlock(ved);
>>>
>>>      return rc;
>>> @@ -229,9 +232,12 @@ static int vm_event_disable(struct domain *d,
>>> struct vm_event_domain *ved)
>>>              }
>>>          }
>>>
>>> +        /*
>>>          destroy_ring_for_helper(&ved->ring_page,
>>>                                  ved->ring_pg_struct);
>>> +       */
>>>
>>> +        ved->ring_page = NULL;
>>>          vm_event_cleanup_domain(d);
>>>
>>>          vm_event_ring_unlock(ved);
>>>
>>> but this unfortunately still fails to map the page the second time. Do
>>> you mean to simply no longer munmap() the ring page from libxc / the
>>> client application?
>> Neither.
>>
>> First of all, I notice that this is probably buggy:
>>
>>     ring_pfn = pfn;
>>     mmap_pfn = pfn;
>>     rc1 = xc_get_pfn_type_batch(xch, domain_id, 1, &mmap_pfn);
>>     if ( rc1 || mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
>>     {
>>         /* Page not in the physmap, try to populate it */
>>         rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
>>                                               &ring_pfn);
>>         if ( rc1 != 0 )
>>         {
>>             PERROR("Failed to populate ring pfn\n");
>>             goto out;
>>         }
>>     }
>>
>> A failure of xc_get_pfn_type_batch() is not a suggestion that population
>> might work.
>>
>>
>> What I meant was taking out this call:
>>
>>     /* Remove the ring_pfn from the guest's physmap */
>>     rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0,
>> &ring_pfn);
>>     if ( rc1 != 0 )
>>         PERROR("Failed to remove ring page from guest physmap");
>>
>> To leave the frame in the guest physmap.  The issue is fundamentally
>> that after this frame has been taken out, something kicks the VM to
>> realise it has an extra frame of balloonable space, which it clearly
>> compensates for.
>>
>> You can work around the added attack surface by marking it RO in EPT;
>> neither Xen's nor dom0's mappings are translated via EPT, so they can
>> still make updates, but the guest won't be able to write to it.
>>
>> I should say that this is all a gross hack, and is in desperate need of
>> a proper API to make rings entirely outside of the gfn space, but this
>> hack should work for now.
> Thanks! So far, it seems to work like a charm like this:

Great.

>
> diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c
> index 2fef96a..5dd00a6 100644
> --- a/tools/libxc/xc_vm_event.c
> +++ b/tools/libxc/xc_vm_event.c
> @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, domid_t
> domain_id, int param,
>      }
>
>      /* Remove the ring_pfn from the guest's physmap */
> +    /*
>      rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0,
> &ring_pfn);
>      if ( rc1 != 0 )
>          PERROR("Failed to remove ring page from guest physmap");
> +    */
> +
> +    if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, mmap_pfn, 1) )
> +    {
> +        PERROR("Could not set ring page read-only\n");
> +        goto out;
> +    }
>
>   out:
>      saved_errno = errno;
>
> Should I send this as a patch for mainline as well?

Probably a good idea, although I would include a code comment explaining
what is going on, because this is subtle if you don't know the context.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.