[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared



On 18/07/16 18:06, Tamas K Lengyel wrote:
>> Incremental improvements are welcome; but they must not cause
>> regressions in existing functionality.
> 
> Existing functionality does not get impaired by this as what happens
> right now is a hypervisor crash. I don't see how things can get any
> worst than that.

Also from another thread:
> If anyone else would have been interested in getting the two systems
> working together othen then me they probably would have complained
> already that hey this crashes the hypervisor. My point being that at
> this point the impact of this patch is likely really low.

From a user perspective, "failing intermittently in some strange and
unpredictable way" is definitely worse than a hypervisor crash. :-)

My concern is that someone will start using guests which use the altp2m
interface internally, and that will all work; and then maybe separately
they will start doing some sort of memory sharing between guests, and
that will all work; and then at some point they'll do memory sharing on
a guest using the altp2m functionality internally, and suddenly they'll
get strange intermittent errors where things don't behave the way they
expect and they don't know why.  A hypervisor crash that tells them
exactly what code has the problem is definitely preferable.

>> The code as it is in the tree right now was intended to allow both
>> sharing and altp2m to be enabled on the same domain, just not over the
>> same gfn range.  And it was intended to be robust -- that is, the
>> sharing code and the altp2m code don't need to be aware of each other
>> and try not to step on each other's toes; each can just do its own thing
>> and Xen will make sure that nothing bad happens (by preventing pages
>> with an altp2m entry from being shared, and unsharing pages for which an
>> altp2m entry is created).
>>
>> It sounds like that's broken right now; it should therefore be fixed.
>> When it is fixed, you'll be able to use both altp2m and sharing on the
>> same domain; Xen will simply prevent sharing from happening on gfn
>> ranges with altp2m entries.
> 
> No, that's incorrect, it's the other way around. If you were to try to
> share pages for which you have altp2m entries it will happily oblige.
> It will just fail to do the altp2m actions for entries of shared
> entries (copying the mapping to the altp2m view, mem_access, etc).

It's quite possible I missed something, but that's not how I read the
code.  Before sharing a page you have to have to call
mem_sharing_nominate_page(), which calls page_make_sharable().
page_make_sharable() will make sure that it has exactly the expected
number of references; which for gfns is 2 and for grant references is 4.

When you map an mfn into an altp2m of a different gfn, you'll increase
the reference count.  So it appears to me that if you attempt to share a
page which is mapped in an altp2m, then the nominate operation will fail
(with -E2BIG, of all things).

Am I mistaken about that?

(BTB this would probably still be the case even after your patch.)

Also, as far as I can tell, "It will just fail to do the altp2m actions
for entries of shared entries" is not true; instead, the page will be
un-shared and the altp2m action will then take place.  Is this not the case?

>> An even bigger improvement would be to allow the same gfns to be subject
>> both to altp2m and sharing at the same time.  But this requires thinking
>> carefully about all the corner cases and making sure that they all work
>> correctly.
> 
> And this is exactly what this patch allows you to do. An entry can now
> be both shared, get properly copied to altp2m views, allow setting
> mem_access in altp2m views, etc. The only situation you have to take
> core of is when the type of the entry changes from shared to unshared
> as that resets the altp2m views.

I described another situation you have to be careful of in an earlier
e-mail:
- host gfn A is marked "shared"
- altp2m gfn O is mapped to gfn A (thus also marked as 'shared')
- Guest writes to gfn O, Xen attempts to unshare the page.

In this circumstance, the fault will ends up in
__mem_sharing_unshare_page(), which will calls rmap_retrieve(d, O, mA).
This returns NULL because gfn O was never put in the reverse map, and
you BUG().

Again, am I misreading what would happen?

I'm pretty sure if I went looking I could find some more situations you
need to avoid to prevent problems.

So the next big missing piece of information in this discussion is
exactly what you do need from this system.  You're using altp2m and
mem_sharing (and mem_access) on the same domain, that's obvious.  Which
features of altp2m are you using -- are you mainly using the mem_access
changes, or are you also using the gfn mapping functionality?

Also, how important is it that pages using altp2m functionality not be
un-shared -- i.e., what proportion of a guest's pages do you expect to
be shared, and what proportion do you need to perform altp2m operations on?

So there are several things we could do here:

1. Allow mem_sharing and altp2m functionality co-existing by avoiding
each other: the same domain can use both, but the same gfn cannot be
both shared and have altp2m entries.  This is essentially fixing the bug
in the current code: Adding an altp2m to a gfn unshares it; and gfns
which have altp2m entries cannot be shared.

This should, I *think*, be fairly straightforward; it might even only
require the lock-reordering part of your patch.

If your use-case only uses altp2m functionality on a small number of
pages, then this might be a not-too-difficult option that would solve
your problem while not introducing any pitfalls for future users to all
into.  If your use case requires large amounts of sharing *and* large
amounts of altp2m entries, this is probably not going to work for you.

2. Allow mem_sharing on altp2m entries with limited functionality.
Specifically: Do not unshare if an altp2m entry only has mem_access
restrictions set; only unshare it if the gfn is remapped.

This would require some very careful thinking about the different corner
cases that might happen, but I don't think it would be quite the morass
that full altp2m support (with gfn remapping) would involve.

If your use-case only uses the mem_access altp2m functionality, this
might be a solution that's a reasonable amount of work.

3. Allow mem_sharing on altp2m entries with full functionality in a
robust manner, making sure all the corner cases are handled correctly.

This would obviously be quite a bit more work.  I don't think it would
actually be wasted, since it should allow you to have full sharing and
full altp2m access without your dom0 code having to worry about all the
corner cases and interactions.  But I can certainly see why you're not
keen on the prospect of doing so.

4. Allow mem_sharing on altp2m entries with full functionality, but with
corner cases the user of those interfaces must avoid or mitigate; and no
restrictions.

This is what your current patch does; and as I've said, I don't think
this is suitable because it lays a trap for future people to fall into
if they end up enabling both.

But after a discussion with IanJ, I've got a couple of variations of
this one.  The variations all involve different ways of making sure that
the "full functionality" is only available when there is a caller that
knows about the risks and will actively try to avoid them.

4a. Add a per-domain flag to decide whether to allow "unsafe" altp2m /
mem_sharing interactions.  If it's clear, then only "safe" operations
are allowed (i.e., it would start with #1, but if someone wanted they
could implement #2 or #3).  If it's set, then things behave as in #4.
(We probably should try to prevent hypervisor crashes due to unsharing
events, however.)

4b. In the altp2m code, as #4; but add code somewhere else to prevent
the entire features from being turned on in a domain unless the "unsafe"
option is requested.  That is, by default a guest with mem_sharing
enabled would not be able to use altp2m at all; and a guest with altp2m
enabled would not be able to use mem_sharing at all.  Only with the
"unsafe" flag set would be allowed to have both altp2m and mem_sharing
enabled at the same time.

Both 4a and 4b seem like they would be relatively straightforward to
implement; like they would fit your use case, but avoid laying a trap
for people in the future.

What do you think about those options?

If #1 is acceptable to you, I continue to think that's probably the
combination of most robust and simplest to achieve.  I would recommend 2
above 4a and 4b if feasible; and if not, I would probably choose 4a over
4b because it allows for incremental improvement.  But any of {1, 2, 4a,
4b} would work for me.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.