[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] save/restore race



> > We should make xc_linux_save cope more gracefully with
> > arch.pfn_to_mfn_frame_list_list being NULL, have xc_linux_save zero
the
> > field in shared_info before writing it out (its an MFN and not valid
> > anyhow), and then move the field's re-initialisation a little later
in
> > the post_suspend function to close the race.
> 
> I've done some testing with the below, and it seems to do the trick.

30 seconds is quite a time to wait... Wouldn't a second be more
appropriate?

While you're at it, I'd be grateful if you could move the shared_info
assignment in linux's post_suspend() and setup_arch() [i386 and x86_64]
below the list building code.

Thanks,
Ian
 
> regards
> john
> 
> 
> # HG changeset patch
> # User john.levon@xxxxxxx
> # Date 1169600022 28800
> # Node ID 3121e0cb7be68fa46c59f61422f05adc03ec5a5e
> # Parent  a75413c0072fa5b892bd8b6a05c1f1d3435bb093
> Close save-after-restore race. Make xc_linux_save() wait for the
> frame_list_list MFN to be updated by the domain before trying to use
it.
> 
> Signed-off-by: John Levon <john.levon@xxxxxxx>
> 
> diff --git a/tools/libxc/xc_linux_save.c b/tools/libxc/xc_linux_save.c
> --- a/tools/libxc/xc_linux_save.c
> +++ b/tools/libxc/xc_linux_save.c
> @@ -403,6 +403,33 @@ static int suspend_and_state(int (*suspe
>      return -1;
>  }
> 
> +/*
> +** Map the top-level page of MFNs from the guest. The guest might not
have
> +** finished resuming from a previous restore operation, so we wait a
while
> for
> +** it to update the MFN to a reasonable value.
> +*/
> +static void *map_frame_list_list(int xc_handle, uint32_t dom,
> +                                 shared_info_t *shinfo)
> +{
> +    int count = 3000;
> +    void *p;
> +
> +    while (count-- && shinfo->arch.pfn_to_mfn_frame_list_list == 0)
> +        usleep(10000);
> +
> +    if (shinfo->arch.pfn_to_mfn_frame_list_list == 0) {
> +        ERROR("Timed out waiting for frame list updated.");
> +        return NULL;
> +    }
> +
> +    p = xc_map_foreign_range(xc_handle, dom, PAGE_SIZE, PROT_READ,
> +
shinfo->arch.pfn_to_mfn_frame_list_list);
> +
> +    if (p == NULL)
> +        ERROR("Couldn't map p2m_frame_list_list (errno %d)", errno);
> +
> +    return p;
> +}
> 
>  /*
>  ** During transfer (or in the state file), all page-table pages must
be
> @@ -663,14 +690,11 @@ int xc_linux_save(int xc_handle, int io_
> 
>      max_pfn = live_shinfo->arch.max_pfn;
> 
> -    live_p2m_frame_list_list =
> -        xc_map_foreign_range(xc_handle, dom, PAGE_SIZE, PROT_READ,
> -                             live_shinfo-
> >arch.pfn_to_mfn_frame_list_list);
> -
> -    if (!live_p2m_frame_list_list) {
> -        ERROR("Couldn't map p2m_frame_list_list (errno %d)", errno);
> -        goto out;
> -    }
> +    live_p2m_frame_list_list = map_frame_list_list(xc_handle, dom,
> +                                                   live_shinfo);
> +
> +    if (!live_p2m_frame_list_list)
> +        goto out;
> 
>      live_p2m_frame_list =
>          xc_map_foreign_batch(xc_handle, dom, PROT_READ,
> @@ -1169,8 +1193,14 @@ int xc_linux_save(int xc_handle, int io_
>      ctxt.ctrlreg[3] =
>          xen_pfn_to_cr3(mfn_to_pfn(xen_cr3_to_pfn(ctxt.ctrlreg[3])));
> 
> +    /*
> +     * Reset the MFN to be a known-invalid value. See
> map_frame_list_list().
> +     */
> +    memcpy(page, live_shinfo, PAGE_SIZE);
> +    ((shared_info_t *)page)->arch.pfn_to_mfn_frame_list_list = 0;
> +
>      if (!write_exact(io_fd, &ctxt, sizeof(ctxt)) ||
> -        !write_exact(io_fd, live_shinfo, PAGE_SIZE)) {
> +        !write_exact(io_fd, page, PAGE_SIZE)) {
>          ERROR("Error when writing to state file (1) (errno %d)",
errno);
>          goto out;
>      }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.