[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server





On 4/19/2016 12:37 PM, Tian, Kevin wrote:
From: Yu, Zhang [mailto:yu.c.zhang@xxxxxxxxxxxxxxx]
Sent: Thursday, April 14, 2016 6:45 PM

On 4/11/2016 7:15 PM, Yu, Zhang wrote:


On 4/8/2016 7:01 PM, George Dunlap wrote:
On 08/04/16 11:10, Yu, Zhang wrote:
[snip]
BTW, I noticed your reply has not be CCed to mailing list, and I also
wonder if we should raise this last question in community?

Oops -- that was a mistake on my part.  :-)  I appreciate the
discretion; just so you know in the future, if I'm purposely changing
the CC list (removing xen-devel and/or adding extra people), I'll almost
always say so at the top of the mail.

And then of course there's the p2m_ioreq_server -> p2m_ram_logdirty
transition -- I assume that live migration is incompatible with this
functionality?  Is there anything that prevents a live migration from
being started when there are outstanding p2m_ioreq_server entries?


Another good question, and the answer is unfortunately yes. :-)

If live migration happens during the normal emulation process, entries
marked with p2m_ioreq_server will be changed to p2m_log_dirty in
resolve_misconfig(), and later write operations will change them to
p2m_ram_rw, thereafter these pages can not be forwarded to device model.
  From this point of view, this functionality is incompatible with live
migration.

But for XenGT, I think this is acceptable, because, if live migration
is to be supported in the future, intervention from backend device
model will be necessary. At that time, we can guarantee from the device
model side that there's no outdated p2m_ioreq_server entries, hence no
need to reset the p2m type back to p2m_ram_rw(and do not include
p2m_ioreq_server in the P2M_CHANGEABLE_TYPES). By "outdated", I mean
when an ioreq server is detached from p2m_ioreq_server, or before an
ioreq server is attached to this type, entries marked with
p2m_ioreq_server should be regarded as outdated.

Is this acceptible to you? Any suggestions?

So the question is, as of this series, what happens if someone tries to
initiate a live migration while there are outstanding p2m_ioreq_server
entries?

If the answer is "the ioreq server suddenly loses all control of the
memory", that's something that needs to be changed.


Sorry, for this patch series, I'm afraid the above description is the
answer.

Besides, I find it's hard to change current code to both support the
deferred resetting of p2m_ioreq_server and the live migration at the
same time. One reason is that a page with p2m_ioreq_server behaves
differently in different situations.

My assumption of XenGT is that, for live migration to work, the device
model should guarantee there's no outstanding p2m_ioreq_server pages
in hypervisor(no need to use the deferred recalculation), and it is our
device model who should be responsible for the copying of the write
protected guest pages later.

And another solution I can think of: when unmapping the ioreq server,
we walk the p2m table and reset entries with p2m_ioreq_server back
directly, instead of deferring the reset. And of course, this means
performance impact. But since the mapping and unmapping of an ioreq
server is not a frequent one, the performance penalty may be acceptable.
How do you think about this approach?


George, sorry to bother you. Any comments on above option? :)

Another choice might be to let live migration fail if there's
outstanding p2m_ioreq_server entries. But I'm not quite inclined to do
so, because:
1> I'd still like to keep live migration feature for XenGT.
2> Not easy to know if there's outstanding p2m_ioreq_server entries. I
mean, since p2m type change is not only triggered by hypercall, to keep
a counter for remaining p2m_ioreq_server entries means a lot code
changes;

Besides, I wonder whether the requirement to reset the p2m_ioreq_server
is indispensable, could we let the device model side to be responsible
for this? The worst case I can imagine for device model failing to do
so is that operations of a gfn might be delivered to a wrong device
model. I'm not clear what kind of damage would this cause to the
hypervisor or other VM.

Does any other maintainers have any suggestions?
Thanks in advance! :)

I'm not sure how above is working. In pre-copy phase (where logdirty
is concerned), the device model is still actively serving requests from
guest, including initiating new write-protection requests. How can you
guarantee draining of outstanding p2m_ioreq_server entries w/o
actually freezing device model (while freezing device model means guest
driver might be blocked with random errors)?


You are right, and I'm not suggesting to clear the p2m_ioreq_server
entries when live migration happens. My suggestion is that either we
guarantee there is no outstanding p2m_ioreq_server entries right after
the ioreq server is unbounded, or do not support live migration for
now.  :)

B.R.
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.