[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PoD code killing domain before it really gets started



On 07/08/12 15:40, Andres Lagar-Cavilla wrote:
On 06.08.12 at 18:03, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
wrote:
I guess there are two problems with that:
* As you've seen, apparently dom0 may access these pages before any
faults happen.
* If it happens that reclaim_single is below the only zeroed page, the
guest will crash even when there is reclaim-able memory available.

Two ways we could fix this:
1. Remove dom0 accesses (what on earth could be looking at a
not-yet-created VM?)
I'm told it's a monitoring daemon, and yes, they are intending to
adjust it to first query the GFN's type (and don't do the access
when it's not populated, yet). But wait, I didn't check the code
when I recommended this - XEN_DOMCTL_getpageframeinfo{2,3)
also call get_page_from_gfn() with P2M_ALLOC, so would also
trigger the PoD code (in -unstable at least) - Tim, was that really
a correct adjustment in 25355:974ad81bb68b? It looks to be a
1:1 translation, but is that really necessary? If one wanted to
find out whether a page is PoD to avoid getting it populated,
how would that be done from outside the hypervisor? Would
we need XEN_DOMCTL_getpageframeinfo4 for this?

2. Allocate the PoD cache before populating the p2m table
3. Make it so that some accesses fail w/o crashing the guest?  I don't
see how that's really practical.
What's wrong with telling control tools that a certain page is
unpopulated (from which they will be able to imply that's it's all
clear from the guest's pov)? Even outside of the current problem,
I would think that's more efficient than allocating the page. Of
course, the control tools need to be able to cope with that. And
it may also be necessary to distinguish between read and
read/write mappings being established (and for r/w ones the
option of populating at access time rather than at creation time
would need to be explored).
I wouldn't be opposed to some form of getpageframeinfo4. It's not just PoD
we are talking about here. Is the page paged out? Is the page shared?

Right now we have global per-domain queries (domaininfo). Or individual
gfn debug memctl's. A batched interface with richer information would be a
blessing for debugging or diagnosis purposes.

The first order of business is exposing the type. Do we really want to
expose the whole range of p2m_* types or just "really useful" ones like
is_shared, is_pod, is_paged, is_normal? An argument for the former is that
the mem event interface already pumps the p2m_* type up the stack.

The other useful bit of information I can think of is exposing the shared
ref count.
I think just like the gfn_to_mfn() interface, we need a "I care about the details" interface and an "I don't care about the details" interface. If a page isn't present, or needs to be un-shared, or is PoD and not currently available, then maybe dom0 callers trying to map that page should get something like -EAGAIN? Just something that indicates, "This page isn't here at the moment, but may be here soon." What do you think?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.