| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: [Xen-devel] [PATCH] x86/ioreq server: Fix DomU couldn't reboot when using p2m_ioreq_server p2m_type
 
To: George Dunlap <george.dunlap@xxxxxxxxxx>, "Zhang, Xiong Y" <xiong.y.zhang@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>From: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>Date: Tue, 9 May 2017 13:21:53 +0800Cc: "george.dunlap@xxxxxxxxxxxxx" <george.dunlap@xxxxxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "paul.durrant@xxxxxxxxxx" <paul.durrant@xxxxxxxxxx>, "Lv, Zhiyuan" <zhiyuan.lv@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>Delivery-date: Tue, 09 May 2017 05:32:16 +0000List-id: Xen developer discussion <xen-devel.lists.xen.org> 
 
On 5/8/2017 7:12 PM, George Dunlap wrote:
 
On 08/05/17 11:52, Zhang, Xiong Y wrote:
 
On 06.05.17 at 03:51, <xiong.y.zhang@xxxxxxxxx> wrote:
 
On 05.05.17 at 05:52, <xiong.y.zhang@xxxxxxxxx> wrote:
 
'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset
outstanding p2m_ioreq_server entries")' will call
p2m_change_entry_type_global() which set entry.recalc=1. Then
the following get_entry(p2m_ioreq_server) will return
p2m_ram_rw type.
But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server
type, then reset p2m_ioreq_server entries. The fact is the assumption
isn't true, and sysnchronously reset function couldn't work. Then
ioreq.entry_count is larger than zero after an ioreq server unmaps,
finally this results DomU couldn't reboot.
 
I've had trouble understanding this part already on v1 (btw, why is
this one not tagged v2?), and since I still can't figure it I have to ask:
Why is it that guest reboot is being impacted here? From what I recall
a non-zero count should only prevent migration.
 
[Zhang, Xiong Y] Sorry, although they solve the same issue, the solution is
totally different, so I didn't mark this as V2, I will mark the following
as v2 with this solution.
During DomU reboot, it will first unmap ioreq server in shutdown process,
then it call map ioreq server in boot process. The following sentence in
p2m_set_ioreq_server() result mapping ioreq server failure, then DomU
couldn't continue booting.
If ( read_atomic(&p->ioreq.entry_count))
    goto out;
 
It is clear that it would be this statement to be the problem one,
but I continue to not see why this would affect reboot: The rebooted
guest runs in another VM with, hence, a different p2m. I cannot see
why there would be a non-zero ioreq.entry_count the first time an
ioreq server claims the p2m_ioreq_server type for this new domain.
 
[Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot
1) unmap io_req_server with old domid
2) map io_req_server with old domid
3)unmap io_req_server with old domid
4) map io_req_server with new domid
The 1) and 2) are triggered by our device reset handler in qemu, it will
destroy old device handler, then create device handler with the old domid
again. so we could see ioreq.entry_could > 0 with old domid, then reboot
process terminated.
 
Oh, so it prevents reboot of XenGT, but not of normal guests?
Why does a reboot cause the device to detach, re-attach, and then
re-detach again?
Also, I'm sorry for missing the bug during review, but it's a bit
annoying to find out that the core functionality of patch -- detaching
and re-attaching -- wasn't tested at all before submission.
 
Thanks for your reply, George.
This error was introduced in the last version of patch "x86/ioreq 
server: Asynchronously reset
outstanding p2m_ioreq_server entries.", which included 2 changes from 
its previous version: 
1> Do not reset p2m_ioreq_server back to p2m_ram_rw when an ioreq server 
is mapped; 
2> Use a helper function to return the p2m type -  this new helper 
routine returns p2m_ram_rw
instead of p2m_ioreq_server if there's no mapping ioreq server. In 
previous versions I just returned
p2m_ioreq_server, and XenGT tests all passed successfully. But I had not 
realized this issue for this
last version and did not have enough time to do the test for this 
version in the last moment before 
code freeze...
Sorry about my negligence, we will perform all necessary tests next 
before we send out our new 
patches. :-)
Yu
 
  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
 
 |