[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 02/12] libxenguest: deal with log-dirty op stats overflow

  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 25 Jun 2021 17:36:40 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=V9aEG+Zut38SWBHt4g05EgKetcQtmvpylvG0L86KFPo=; b=jpSVf6u4jxQBCzfCyoFHAbFB+lhdx8RsK6nDqvr8Be79TkQn5h7GffJKlmrROneUffcM4o8PXjJvzquqrWPzcr/HQLVkZ/wXXRJQJWE3HMX21foLpyjXqW5sqvzxhW+8+OH06NPL+/lobNPeL18u0VJJ9JJRhMgZrLnpPQkAQUuE/TFy1B0lCSPSwxnnR+5+X7SjXQ9qVIT1X7Xl8FVLmiiSflzU/zTuXviKiuLDCHkQbyvgft/bnExpOTkAuDs7KHKX2Okl9j4S6m96jyQHeeiSVJ8Bi+TIb6V5/7Dept7zVpRApq2kZ411QlmqTFIb25XodxPcGW+rTT+YfEtbLQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oQJmf+sY56j+fw3N/vRQnNwjzP4MKFC7oDasDGth7aKkbPJnEnjCk2nG62qsDTwzAMcfzKAo/t9PD4EGUpjwxRbEm3AlC5IQJ0j5a0OdbJWY6mDbNLk58ssKEODnj2K/s2ZVU/Ogn8LeKoBze0JSs74LbZoG5KeiIP9DRDKT3ipksfNYrJz0SrmQJRlRPWGlF8AgpXE/QEFK0zJODKC2WuO2hq8kBnPDeDkYlXmw5Kv3JFcHEI0QSWITGIGBtJG0BS2MqjsX9fFN3XbvpRZClUVY0eryGImeDx+QdwZW8EtV9tfMNVgZDf/W1atnFQQg6wZGkmIvMm+dnoGy+PQKSg==
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Fri, 25 Jun 2021 16:37:05 +0000
  • Ironport-hdrordr: A9a23:A7AE/qG2PMIJB6fEpLqEEseALOsnbusQ8zAXPiBKJCC9vPb5qy nOpoV+6faQslwssR4b9uxoVJPvfZq+z+8R3WByB8bAYOCOggLBQL2KhbGI/9SKIVydygcy78 Zdm6gVMqyMMbB55/yKnDVRxbwbsaa6GKPDv5ah8590JzsaDJ2Jd21Ce32m+ksdfnghObMJUK Cyy+BgvDSadXEefq2AdwM4t7iqnayzqHr+CyR2fyIa1A==
  • Ironport-sdr: +9aCOyYyNyO7pM3XfxZ9oQz1Cv9iTzym7IIX44Y6ncgqxRYl2esvk15Kul3FDP2A3ePzrBf79n 1him43Iaxqq612FyO8RL+gH3BYTSqIQjA7idv1OGmeCdxQ3VTYzxjLsWicnInQysGFjYrLJ8Dn nz9xISUEZMFKp7pf6WhAuFOxR1unrtw2PD8lPJdFPCm+imSIBlolom9sXeOMw3XJYDAFWokanr qt+Img8KmOAFbfulcmfzwPTSuSEZd3h/1g09eYYE0j36Uk+I9d+Mdv6L9PADOytd6nNTrc008I kR4=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25/06/2021 14:18, Jan Beulich wrote:
> In send_memory_live() the precise value the dirty_count struct field
> gets initialized to doesn't matter much (apart from the triggering of
> the log message in send_dirty_pages(), see below), but it is important
> that it not be zero on the first iteration (or else send_dirty_pages()
> won't get called at all). Saturate the initializer value at the maximum
> value the field can hold.

I don't follow.  Migration would be extremely broken if the first
iteration didn't work correctly, so something else is going on here.

> While there also initialize struct precopy_stats' respective field to a
> more sane value: We don't really know how many dirty pages there are at
> that point.
> In suspend_and_send_dirty() and verify_frames() the local variables
> don't need initializing at all, as they're only an output from the
> hypercall which gets invoked first thing.
> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
> altogether: It's optional to xc_logdirty_control() and not used anywhere
> else.
> Note that in case the clipping actually takes effect, the "Bitmap
> contained more entries than expected..." log message will trigger. This
> being just an informational message, I don't think this is overly
> concerning.

That message is currently a error, confirming that the VM will crash on
the resuming side.

This is a consequence of it attempting to balloon during the live phase
of migration, and discussed in docs/features/migration.pandoc (well - at
least mentioned on the "noone has fixed this yet" list).

> --- a/tools/libs/guest/xg_sr_save.c
> +++ b/tools/libs/guest/xg_sr_save.c
> @@ -500,7 +500,9 @@ static int simple_precopy_policy(struct
>  static int send_memory_live(struct xc_sr_context *ctx)
>  {
>      xc_interface *xch = ctx->xch;
> -    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
> +    xc_shadow_op_stats_t stats = {
> +        .dirty_count = MIN(ctx->save.p2m_size, (typeof(stats.dirty_count))~0)
> +    };
>      char *progress_str = NULL;
>      unsigned int x = 0;
>      int rc;
> @@ -519,7 +521,7 @@ static int send_memory_live(struct xc_sr
>          goto out;
>      ctx->save.stats = (struct precopy_stats){
> -        .dirty_count = ctx->save.p2m_size,
> +        .dirty_count = -1,

This is an external interface, and I'm not sure it will tolerate finding
more than p2m_size allegedly dirty.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.