[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.





On 6/22/2016 7:33 PM, George Dunlap wrote:
On 22/06/16 11:07, Yu Zhang wrote:

On 6/22/2016 5:47 PM, George Dunlap wrote:
On 22/06/16 10:29, Jan Beulich wrote:
On 22.06.16 at 11:16, <george.dunlap@xxxxxxxxxx> wrote:
On 22/06/16 07:39, Jan Beulich wrote:
On 21.06.16 at 16:38, <george.dunlap@xxxxxxxxxx> wrote:
On 21/06/16 10:47, Jan Beulich wrote:
And then - didn't we mean to disable that part of XenGT during
migration, i.e. temporarily accept the higher performance
overhead without the p2m_ioreq_server entries? In which case
flipping everything back to p2m_ram_rw after (completed or
canceled) migration would be exactly what we want. The (new
or previous) ioreq server should attach only afterwards, and
can then freely re-establish any p2m_ioreq_server entries it
deems necessary.

Well, I agree this part of XenGT should be disabled during
migration.
But in such
case I think it's device model's job to trigger the p2m type
flipping(i.e. by calling
HVMOP_set_mem_type).
I agree - this would seem to be the simpler model here, despite
(as
George validly says) the more consistent model would be for the
hypervisor to do the cleanup. Such cleanup would imo be reasonable
only if there was an easy way for the hypervisor to enumerate all
p2m_ioreq_server pages.
Well, for me, the "easy way" means we should avoid traversing
the whole ept
paging structure all at once, right?
Yes.
Does calling p2m_change_entry_type_global() not satisfy this
requirement?
Not really - that addresses the "low overhead" aspect, but not the
"enumerate all such entries" one.
I'm sorry, I think I'm missing something here.  What do we need the
enumeration for?
We'd need that if we were to do the cleanup in the hypervisor (as
we can't rely on all p2m entry re-calculation to have happened by
the time a new ioreq server registers for the type).
So you're afraid of this sequence of events?
1) Server A de-registered, triggering a ioreq_server -> ram_rw type
change
2) gfn N is marked as misconfigured
3) Server B registers and marks gfn N as ioreq_server
4) When N is accessed, the misconfiguration is resolved incorrectly to
ram_rw

But that can't happen, because misconfigured entries are resolved before
setting a p2m entry; so at step 3, gfn N will be first set to
(non-misconfigured) ram_rw, then changed to (non-misconfigured)
ioreq_server.

Or is there another sequence of events that I'm missing?
Thanks for your reply, George. :)
If no log dirty is triggered during this process, your sequence is correct.
However, if log dirty is triggered, we'll met problems. I have described
this
in previous mails :

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
on Jun 20

and

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
on Jun 21
Right -- sorry, now I see the issue:

1. Server A marks gfn X as ioreq_server
2. Server A deregisters, gfn X misconfigured
3. Server B registers, marks gfn Y as ioreq_server
4. Logdirty mode enabled; gfn Y misconfigured
5. When X or Y are accessed, resolve_misconfigure() has no way of
telling whether the entry is from server A (which should be set to
logdirty) or from server B (which should be left as ioreq_server).

Exactly.  :)
Another simpler scenario would be
1. Server A marks gfn X as p2m_ioreq_server;
2. Logdirty mode enabled; gfn X misconfigured;
3. When X is written, it will not cause ept vioalation, but ept misconfig, and the resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we can not
track access to X;

Note: Not resetting the p2m type for p2m_ioreq_server when p2m->ioreq_server is not NULL is suitable for this simpler scenario, but is not correct if take your scenario
into account.

The core reason is I could not find a simple solution in resolve_misconfig() to handle handle both the outdated p2m_ioreq_server entries, the in-use ones and to support
the logdirty feature at the same time.

In a sense this is a deficiency in the change_entry_type_global()
interface.  A common OS principle is "make the common case fast, and the
uncommon case correct".  The scenario described above seems to me to be
an uncommon case which is handled quickly but incorrectly; ideally we
should handle it correctly, even if it's not very quick.

Synchronously resolving a previous misconfig is probably the most
straightforward thing to do.  It could be done at point #3, when an M->N
type change is not complete and a new p2m entry of type M is written; it
could be at point #4, when an N->O type change is initiated while an
M->N type change hasn't completed.  Or it could be when an N->O type
change happens while there are unfinished M->N transitions *and*
post-type-change M entries.

Sorry, I did not quite get it.  Could you please elaborate more? Thanks! :)


But, that's a lot of somewhat complicated work for a scenario that is
unlikely to happen in practice, and I can see why Yu Zhang would feel
reluctant to do it.

Yes, that's part of my reasons. :)


For the time being though, this will fail at #4, right?  That is,
logdirty mode cannot be enabled while server B is registered?

That does mean we'd be forced to sort out the situation before we allow
logdirty and ioreq_server to be used at the same time, but that doesn't
really seem like such a bad idea to me.

One solution I thought of is to just return failure in hap_enable_log_dirty() if p2m->ioreq.server is not NULL. But I did not choose such approach, because:

1> I still want to keep the logdirty feature so that XenGT can use it to keep track
of the dirty rams when we support live migration in the future;

2> I also agree with Paul's argument: it is device model's duty to do the p2m type
resetting work.

I'm still open to being convinced, but at the moment it really seems to
me like improving the situation is the better long-term option.


Thanks for all your advices, George. I'm also willing to taking other advices, if we have a more acceptable(for you, Jan and other maintainers) resync approach in hypervisor, I'd like to add this. If the code is too complicated, I can submit it in a separate new
patchset. :)


B.R.
Yu



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.