[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
From: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
Date: Fri, 24 Mar 2017 20:45:46 +0800
Cc: "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>, "Lv, Zhiyuan" <zhiyuan.lv@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
Delivery-date: Fri, 24 Mar 2017 12:55:25 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>



On 3/24/2017 5:37 PM, Tian, Kevin wrote:

From: Yu Zhang [mailto:yu.c.zhang@xxxxxxxxxxxxxxx]
Sent: Wednesday, March 22, 2017 6:12 PM

On 3/22/2017 4:10 PM, Tian, Kevin wrote:

From: Yu Zhang [mailto:yu.c.zhang@xxxxxxxxxxxxxxx]
Sent: Tuesday, March 21, 2017 10:53 AM

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()

interface.

This patch also disallows live migration, when there's still any
outstanding p2m_ioreq_server entry left. The core reason is our
current implementation of p2m_change_entry_type_global() can not tell
the state of p2m_ioreq_server entries(can not decide if an entry is
to be emulated or to be resynced).

Don't quite get this point. change_global is triggered only upon
unmap. At that point there is no ioreq server to emulate the write
operations on those entries. All the things required is just to change
the type. What's the exact decision required here?

Well, one situation I can recall is that if another ioreq server maps to this
type, and live migration happens later. The resolve_misconfig() code cannot
differentiate if an p2m_ioreq_server page is an obsolete to be synced, or a
new one to be only emulated.

so if you disallow another mapping before obsolete pages are synced
as you just replied in another mail, then such limitation would be gone?


Well, it may still have problems.

Even we know the remaining p2m_ioreq_server is an definitely an outdatedone, resolve_misconfig()still lacks information to decide if this p2m type is supposed to resetto p2m_ram_rw(when in a p2msweeping), or it shall be marked as p2m_log_dirty(during a livemigration process). Current code

in resolve_misconfig() lacks such information.

I mean, we surely can reset these pages to p2m_log_dirty directly ifglobal_logdirty is on. But I do notthink this is the correct thing, although these pages will be reset backto p2m_ram_rw later in the eptviolation handling process, it might cause some clean pages(which werewrite protected once, but nolonger now) to be tracked and be sent to the target later if livemigration is triggered.

I gave some explanation on this issue in discussion during Jun 20 - 22 last
year.

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
on Jun 20
and
http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
on Jun 21

btw does it mean that live migration can be still supported as long as
device model proactively unmaps write-protected pages before starting
live migration?

Yes.

I'm not sure whether there'll be a sequence issue. I assume toolstack
will first request entering logdirty mode, do iterative memory copy,
and then stop VM including its virtual devices and then build final
image (including vGPU state). XenGT supports live migration today.
vGPU device model is notified for do state save/restore only in the
last step of that flow (as part of Qemu save/restore). If your design
requires vGPU device model to first unmaps write-protected pages
(which means incapable of serving more request from guest which
is equivalently to stop vGPU) before toolstack enters logdirty mode,
I'm worried the required changes to the whole live migration flow...

Well, previously, George has written a draft patch to solve this issueand make the lazy p2mchange code more generic(with more information added in the p2mstructure). We believed itmay also solve the live migration restriction(with no changes in theinterface between thehypervisor and device model). But there's some bugs, neither of us haveenough time to debug.So I'd like to submit our current code first, and with that solutionbeing mature, we can remove

the live migration restriction. :-)

Thanks
Yu

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
  - From: Yu Zhang
- [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Tian, Kevin
- Re: [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Tian, Kevin

Prev by Date: [Xen-devel] [linux-linus test] 106857: regressions - FAIL
Next by Date: Re: [Xen-devel] [GSoC] GSoC Introduction : Fuzzing Xen hypercall interface
Previous by thread: Re: [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Next by thread: Re: [Xen-devel] [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.