[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI


There's no such thing as a "migration" on physical hardware and a
save/restore etc is under kernel control so it knows not to cache timer
values etc.

Indeed, so it's the live migrate which is causing it!

If that's correct, and I've understood what George said, then
I /think/ the only quirky fix that needs doing is this is to change
the API between kernel driver and xen so that 'don't give me a time
in the past' means 'don't give me a time in the past unless you've
just done a live migrate'.

What does "just" mean here? How do you determine it?

I'd suggest whatever time interval is required to resync. If you said
1 second, for instance, that would be a bodge, but would presumably
work unless the clocks were out by more than a second.

I said "filling the hypervisor with lots of quirky exceptions", this is
just one and in isolation maybe it isn't too bad. Now imagine we'd
accumulated a dozen over the last 10 years, the semantics of our timer
operation would be impossible to understand, do this unless A, otherwise
if not B do something else, etc etc.

 If you really want giving a time in the
past to error under some circumstances, you can signal that another
way ('really don't give me a time in the past).

That would be changing the behaviour of an existing ABI AFAICT, which is
right out -- what if some other guest is relying on the current

Well Linux is sort of relying on it - so we might fix those guests too :-)

I suppose the result would be that if anyone relied on the failure of
the timer event in the one second following migration, then sometimes
that failure would not happen.

But in any case until George (or someone else) has actually diagnosed
what is going on this entire discussion is premature.

 Yes, it would be lovely if everyone always applied the latest
patches to their kernel and rebooted, but they don't.

Otherwise the net result will be Xen4.3 does not reliably live migrate
a pile of Linux OS's unless running with a patched kernel. That is not
a great conclusion.

Are you saying this didn't happen with Xen 4.2 and earlier? That would
tend to lean towards this being a Xen bug.

It happens in 4.2.

We did not discover it in 4.1, but have not retested so comprehensively.
And in 4.1 we were using a different device model (if that's relevant).

Alex Bligh

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.