[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit moratorium to staging



On Fri, Nov 03, 2017 at 05:57:52PM +0000, George Dunlap wrote:
> On 11/03/2017 02:52 PM, George Dunlap wrote:
> > On 11/03/2017 02:14 PM, Roger Pau Monné wrote:
> >> On Thu, Nov 02, 2017 at 09:55:11AM +0000, Paul Durrant wrote:
> >>> Hmm. I wonder whether the guest is actually healthy after the migrate. 
> >>> One could imagine a situation where the storage device model (IDE in our 
> >>> case I guess) gets stuck in some way but recovers after a timeout in the 
> >>> guest storage stack. Thus, if you happen to try shut down while it is 
> >>> still stuck Windows starts trying to shut down but can't. Try after the 
> >>> timeout though and it can.
> >>> In the past we did make attempts to support Windows without PV drivers in 
> >>> XenServer but xenrt would never reliably pass VM lifecycle tests using 
> >>> emulated devices. That was with qemu trad, but I wonder whether upstream 
> >>> qemu is actually any better particularly if using older device models 
> >>> such as IDE and RTL8139 (which are probably largely unmodified from trad).
> >>
> >> Since I've been looking into this for a couple of days, and found no
> >> solution I'm going to write what I've found so far:
> >>
> >>  - The issue only affects Windows guests.
> >>  - It only manifests itself when doing live migration, non-live
> >>    migration or save/resume work fine.
> >>  - It affects all x86 hardware, the amount of migrations in order to
> >>    trigger it seems to depend on the hardware, but doing 20 migrations
> >>    reliably triggers it on all the hardware I've tested.
> > 
> > Not good.
> > 
> > You said that Windows reported that the login process failed somehow?
> > 
> > Is it possible something bad is happening, like sending spurious page
> > faults to the guest in logdirty mode?
> > 
> > I wonder if we could reproduce something like it on Linux -- set a build
> > going and start localhost migrating; a spurious page fault is likely to
> > cause the build to fail.
> 
> Well, with a looping xen-build going on in the guest, I've done 40 local
> migrates with no problems yet.
> 
> But Roger -- is this on emulated devices only, no PV drivers?
> 
> That might be something worth looking at.

Yes, windows doesn't have PV devices. But save/restore and non-live
migration seems fine, so it doesn't look to be related to devices, but
rather to log-dirty or some other aspect of live-migration.

Or maybe it's something indeed related to emulated devices that's more
easily triggerable on live-migration.

I'm also thinking it would be helpful to do x20 save/restore,
shutdown, create, x20 migrations and shutdown. That would help us
identify problems related to save/restore and live-migration more
easily.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.