[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] repeated live migration for VM failed



George, thanks the fixing.
With the patch, the testing is running on 90+ time LM without any error till 
now, let's wait for the final result.

Thanks,
-Xudong


> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
> Sent: Monday, May 22, 2017 7:03 PM
> To: Hao, Xudong <xudong.hao@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx
> Cc: Lars Kurth <lars.kurth@xxxxxxxxxx>; Julien Grall <julien.grall@xxxxxxx>; 
> Gao,
> Chao <chao.gao@xxxxxxxxx>; Paul Durrant <paul.durrant@xxxxxxxxxx>; Andrew
> Cooper <andrew.cooper3@xxxxxxxxxx>; Jan Beulich <JBeulich@xxxxxxxx>
> Subject: Re: [Xen-devel] [BUG] repeated live migration for VM failed
> 
> On Mon, May 22, 2017 at 11:18 AM, George Dunlap <george.dunlap@xxxxxxxxxx>
> wrote:
> > On 22/05/17 07:35, Hao, Xudong wrote:
> >> Bug detailed description:
> >>
> >> ----------------
> >>
> >> Create one RHEL7.3 HVM and do live migration continuously, while doing the
> 200+ or 300+ times live-migration, tool stack report error and migration 
> failed.
> >>
> >>
> >>
> >> Environment :
> >>
> >> ----------------
> >>
> >> HW: Skylake server
> >>
> >> Xen: Xen 4.9.0 RC4
> >>
> >> Dom0: Linux 4.11.0
> >>
> >>
> >>
> >> Reproduce steps:
> >>
> >> ----------------
> >>
> >> 1.      Compile Xen 4.9 Rc4 and dom0 kernel 4.11.0, boot to dom0
> >>
> >> 2.      Boot RHEL7.3 HVM guest
> >>
> >> 3.      Migrate guest to localhost, sleep 10 seconds
> >>
> >> 4.      Repeat doing the step3.
> >>
> >>
> >>
> >> Current result:
> >>
> >> ----------------
> >>
> >> VM Migration fail.
> >>
> >>
> >>
> >> Base error log:
> >>
> >> ----------------
> >>
> >> xl migrate 24hrs_lm_guest_2 localhost
> >>
> >> root@localhost's password:
> >>
> >> migration target: Ready to receive domain.
> >>
> >> Saving to migration stream new xl format (info 0x3/0x0/1761)
> >>
> >> Loading new save file <incoming migration stream> (new xl fmt info
> >> 0x3/0x0/1761)
> >>
> >> Savefile contains xl domain config in JSON format
> >>
> >> Parsing config from <saved>
> >>
> >> xc: info: Saving domain 273, type x86 HVM
> >>
> >> xc: info: Found x86 HVM domain from Xen 4.9
> >>
> >> xc: info: Restoring domain
> >>
> >> xc: error: set HVM param 12 = 0x00000000feffe000 (85 = Interrupted
> >> system call should ): Internal error
> >>
> >> xc: error: Restore failed (85 = Interrupted system call should ):
> >> Internal error
> >
> > Interesting -- it appears that setting HVM_PARAM_IDENT_PT (#12) can
> > fail with -ERESTART.  But the comment for ERESTART makes it explicit
> > that it should be internal only -- it should cause a hypercall
> > continuation (so that the hypercall restarts automatically), rather
> > than returning to the guest.
> >
> > But the hypercall continuation code seems to have disappeared from
> > do_hvm_op() at some point?
> >
> > /me digs a bit more...
> 
> The problem turns out to be commit ae20ccf ("dm_op: convert
> HVMOP_set_mem_type"), which says:
> 
>     This patch removes the need for handling HVMOP restarts, so that
>     infrastructure is removed.
> 
> While it's true that there are no more operations which need iteration
> information restored, but there are two operations which may still need to be
> restarted to avoid deadlocks with other operations.
> 
> Attached is a patch which restores hypercall continuation checking.
> Xudong, can you give it a test?
> 
> Thanks,
>  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.