[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Live migrate with Linux >= 4.13 domU causes kernel time jumps and TCP connection stalls.


  • To: Juergen Gross <jgross@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Hans van Kranenburg <Hans.van.Kranenburg@xxxxxxxxxx>
  • Date: Mon, 7 Jan 2019 12:56:00 +0000
  • Accept-language: en-US
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Hans.van.Kranenburg@xxxxxxxxxx;
  • Cc: Igor Yurchenko <Igor.Yurchenko@xxxxxxxxxx>
  • Delivery-date: Mon, 07 Jan 2019 12:56:16 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99
  • Thread-index: AQHUmVY0pi+83rZKCU2oXS6li4uWF6WNDZ6AgAYRWQCAANrSgIAASnWAgA+LTACAAA56gA==
  • Thread-topic: [Xen-devel] Live migrate with Linux >= 4.13 domU causes kernel time jumps and TCP connection stalls.

On 1/7/19 1:04 PM, Juergen Gross wrote:
> On 28/12/2018 15:41, Hans van Kranenburg wrote:
>> On 12/28/18 11:15 AM, Juergen Gross wrote:
>>
>> [...]
>> So that explains the 18446742891.874140 number, which just corresponds
>> to something near to 'minus 23 minutes'.
> 
> I have a local reproducer for the issue now: instead of using live
> migration I'm just doing a "xl save" after the guest running for some
> minutes. The I reboot the host and do a "xl restore" as soon as
> possible.
> 
> Another note: HVM domains (and probably PVH, too) show the huge time
> info ("[18446742937.583537] ..."), while PV domains seem to show just
> a small jump backwards in time:
> 
> [  224.719316] Freezing user space processes ... (elapsed 0.001 seconds)
> done.
> [  224.720443] OOM killer disabled.
> [  224.720448] Freezing remaining freezable tasks ... (elapsed 0.001
> seconds) done.
> [  224.721678] PM: freeze of devices complete after 0.107 msecs
> [  224.721687] suspending xenstore...
> [  224.721726] PM: late freeze of devices complete after 0.037 msecs
> [  224.736062] PM: noirq freeze of devices complete after 14.325 msecs
> [  224.736155] xen:grant_table: Grant tables using version 1 layout
> [    4.404026] Suspended for 187.219 seconds

And if you cause a time difference that lets it go down below zero?

I can just as easily reproduce with PV, and don't see much difference in
behavior with PVH. Actually, all the bisect steps to find it were done
using PV.

I haven't tried HVM, since I'm not using that at all.

Hans
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.