Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/ioreq server: Fix DomU couldn't reboot when using p2m_ioreq_server p2m_type

To: George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

From: "Zhang, Xiong Y" <xiong.y.zhang@xxxxxxxxx>

Date: Mon, 8 May 2017 11:59:52 +0000

Accept-language: en-US

Cc: "george.dunlap@xxxxxxxxxxxxx" <george.dunlap@xxxxxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "paul.durrant@xxxxxxxxxx" <paul.durrant@xxxxxxxxxx>, "yu.c.zhang@xxxxxxxxxxxxxxx" <yu.c.zhang@xxxxxxxxxxxxxxx>, "Lv, Zhiyuan" <zhiyuan.lv@xxxxxxxxx>, "Zhang, Xiong Y" <xiong.y.zhang@xxxxxxxxx>

Delivery-date: Mon, 08 May 2017 12:00:09 +0000

Dlp-product: dlpe-windows

Dlp-reaction: no-action

Dlp-version: 10.0.102.7

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHSx+wX4+RDKstZ+UiXGnjrPzg6faHqUpHw

Thread-topic: [PATCH] x86/ioreq server: Fix DomU couldn't reboot when using p2m_ioreq_server p2m_type

> On 08/05/17 11:52, Zhang, Xiong Y wrote: > >>>>> On 06.05.17 at 03:51, <xiong.y.zhang@xxxxxxxxx> wrote: > >>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@xxxxxxxxx> wrote: > >>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset > >>>>> outstanding p2m_ioreq_server entries")' will call > >>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then > >>>>> the following get_entry(p2m_ioreq_server) will return > >>>>> p2m_ram_rw type. > >>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset > >>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")' > >>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server > >>>>> type, then reset p2m_ioreq_server entries. The fact is the assumption > >>>>> isn't true, and sysnchronously reset function couldn't work. Then > >>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps, > >>>>> finally this results DomU couldn't reboot. > >>>> > >>>> I've had trouble understanding this part already on v1 (btw, why is > >>>> this one not tagged v2?), and since I still can't figure it I have to > >>>> ask: > >>>> Why is it that guest reboot is being impacted here? From what I recall > >>>> a non-zero count should only prevent migration. > >>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the solution > >>> is > >>> totally different, so I didn't mark this as V2, I will mark the following > >>> as v2 with this solution. > >>> During DomU reboot, it will first unmap ioreq server in shutdown process, > >>> then it call map ioreq server in boot process. The following sentence in > >>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU > >>> couldn't continue booting. > >>> If ( read_atomic(&p->ioreq.entry_count)) > >>> goto out; > >> > >> It is clear that it would be this statement to be the problem one, > >> but I continue to not see why this would affect reboot: The rebooted > >> guest runs in another VM with, hence, a different p2m. I cannot see > >> why there would be a non-zero ioreq.entry_count the first time an > >> ioreq server claims the p2m_ioreq_server type for this new domain. > >> > > [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot > > 1) unmap io_req_server with old domid > > 2) map io_req_server with old domid > > 3)unmap io_req_server with old domid > > 4) map io_req_server with new domid > > > > The 1) and 2) are triggered by our device reset handler in qemu, it will > > destroy old device handler, then create device handler with the old domid > > again. so we could see ioreq.entry_could > 0 with old domid, then reboot > > process terminated. > > Oh, so it prevents reboot of XenGT, but not of normal guests? [Zhang, Xiong Y] Yes, only XenGT has this issue. > > Why does a reboot cause the device to detach, re-attach, and then > re-detach again? [Zhang, Xiong Y] As our vgt_reset() function will destroy a vgt instance, then create a vgt instance. And this vgt_reset function is also used in normal boot process. > > Also, I'm sorry for missing the bug during review, but it's a bit > annoying to find out that the core functionality of patch -- detaching > and re-attaching -- wasn't tested at all before submission. [Zhang, Xiong Y] Sorry, it is mainly our fault that we miss this. As we are preparing Xengt community release recently and are refactoring our code, then this issue are exposed. > > -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.