[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Domain creation errors



On 28/06/16 18:56, Andrew Cooper wrote:
> <snip>
>
> This is the first in a number of changes trying to clean up error reporting of
> memory conditions.

One area which is constantly causing problems is creation of domains in
low memory conditions.  In the case where the toolstack gets its
calculations wrong, it is very difficult to diagnose what precicely went
wrong, for a number of reasons.

The error handling in xc_domain_populate_physmap_exact() looks like

    if ( err >= 0 )
    {
        DPRINTF("Failed allocation for dom %d: %ld extents of order %d\n",
                domid, nr_extents, extent_order);
        errno = EBUSY;
        err = -1;
    }

which hides any error whatsoever behind an -EBUSY.

The reason for this is that the hypercall interface for
XENMEM_populate_physmap identifies which extent failed, but not why it
failed.  (Actually, there are plenty of errors which are accounted
against "the next extent", rather than being directly caused by that
extent).


The error conditions I want to be able to distinguish are "Xen is
completely out of memory", and "You are trying to add memory to a domain
beyond its allotted limit".  The former indicates that a toolstacks
logic to account overall memory is problematic, while the latter
indicates a mismatch between different toolstack areas.  Sadly, both end
up as -EBUSY.

The first problem to solve is that alloc_domheap_pages() can't
distinguish the errors.  Two returns from it are legitimately an
indication of ENOMEM, but the assign_pages() failure is specifically
not.  Returning a struct page_info pointer is unhelpful at
distinguishing the issues.

I have thought of two approaches, but its hard to decide between them.

First is to change alloc_domheap_pages() prototype to

int alloc_domheap_pages(struct domain *d, unsigned int order, unsigned
int memflags, struct page_info *out);

while the second is to use PTR_ERR().

Using PTR_ERR() is less disruptive to the code, but will cause
collateral damage for anyone with out-of-tree patches, as the code will
compile but the error logic will be wrong.  The use of PTR_ERR() is also
quite dangerous in the context of a PV guest, as the resulting pointer
is under 64bit guest ABI control.

I am leaning towards the first option, which at least has the advantage
that any out-of-tree code will break in an obvious way.

Any opinions or alternate suggestions?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.