[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops



On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote:
> When we concurrently try to unload and load crash
> images we eventually get:
>
>  Xen call trace:
>     [<ffff82d08018b04f>] machine_kexec_add_page+0x3a0/0x3fa
>     [<ffff82d08018b184>] machine_kexec_load+0xdb/0x107
>     [<ffff82d080116e8d>] kexec.c#kexec_load_slot+0x11/0x42
>     [<ffff82d08011724f>] kexec.c#kexec_load+0x119/0x150
>     [<ffff82d080117c1e>] kexec.c#do_kexec_op_internal+0xab/0xcf
>     [<ffff82d080117c60>] do_kexec_op+0xe/0x1e
>     [<ffff82d08025c620>] pv_hypercall+0x20a/0x44a
>     [<ffff82d080260116>] cpufreq.c#test_all_events+0/0x30
>
>  Pagetable walk from ffff820040088320:
>   L4[0x104] = 00000002979d1063 ffffffffffffffff
>   L3[0x001] = 00000002979d0063 ffffffffffffffff
>   L2[0x000] = 00000002979c7063 ffffffffffffffff
>   L1[0x088] = 80037a91ede97063 ffffffffffffffff
>
> The interesting thing is that the page bits (063) look legit.
>
> The operation on which we blow up is us trying to write
> in the L1 and finding that the L2 entry points to some
> bizzare MFN. It stinks of a race, and it looks like
> the issue is due to no concurrency locks when dealing
> with the crash kernel space.
>
> Specifically we concurrently call kimage_alloc_crash_control_page
> which iterates over the kexec_crash_area.start -> kexec_crash_area.size
> and once found:
>
>   if ( page )
>   {
>       image->next_crash_page = hole_end;
>       clear_domain_page(_mfn(page_to_mfn(page)));
>   }
>
> clears. Since the parameters of what MFN to use are provided
> by the callers (and the area to search is bounded) the the 'page'
> is probably the same. So #1 we concurrently clear the
> 'control_code_page'.
>
> The next step is us passing this 'control_code_page' to
> machine_kexec_add_page. This function requires the MFNs:
> page_to_maddr(image->control_code_page).
>
> And this would always return the same virtual address, as
> the MFN of the control_code_page is inside of the
> kexec_crash_area.start -> kexec_crash_area.size area.
>
> Then machine_kexec_add_page updates the L1 .. which can be done
> concurrently and on subsequent calls we mangle it up.
>
> This is all a theory at this time, but testing reveals
> that adding the hypercall_create_continuation() at the
> kexec hypercall fixes the crash.
>
> NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot
> when unloading kexec image) to prevent crashes during
> simultaneous load/unloads.
>
> NOTE: Consideration was given to using the existing flag
> KEXEC_FLAG_IN_PROGRESS to denote a kexec hypercall in
> progress. This, however, overloads the original intent of
> the flag which is to denote that we are about-to/have made
> the jump to the crash path. The overloading would lead to
> failures in existing checks on this flag as the flag would
> always be set at the top level in do_kexec_op_internal().
> For this reason, the new flag KEXEC_FLAG_HC_IN_PROGRESS
> was introduced.
>
> While at it, fixed the #define mismatched spacing
>
> Signed-off-by: Eric DeVolder <eric.devolder@xxxxxxxxxx>
> Reviewed-by: Bhavesh Davda <bhavesh.davda@xxxxxxxxxx>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx>
> ---
> v3:
>  - Incorporated feedback from Jan Beulich to change the
>    name of the new flag to KEXEC_FLAG_IN_HYPERCALL
>
> v2: 04/17/2017
>  - Patch titled 'kexec: use hypercall_create_continuation to protect KEXEC 
> ops'
>  - Jan Beulich directed me to use a continuation method instead
>    of spinlock.
>  - Incorporated feedback from Daniel Kiper <daniel.kiper@xxxxxxxxxx>
>  - Incorporated feedback from Konrad Wilk <konrad.wilk@xxxxxxxxxx>
>
> v1: 04/10/2017
>  - Patch titled 'kexec: Add spinlock for the whole hypercall'
>  - Used spinlock in do_kexec_op_internal()
> ---
>  xen/common/kexec.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/xen/common/kexec.c b/xen/common/kexec.c
> index 072cc8e..253c204 100644
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -50,9 +50,10 @@ static cpumask_t crash_saved_cpus;
>
>  static struct kexec_image *kexec_image[KEXEC_IMAGE_NR];
>
> -#define KEXEC_FLAG_DEFAULT_POS   (KEXEC_IMAGE_NR + 0)
> -#define KEXEC_FLAG_CRASH_POS     (KEXEC_IMAGE_NR + 1)
> -#define KEXEC_FLAG_IN_PROGRESS   (KEXEC_IMAGE_NR + 2)
> +#define KEXEC_FLAG_DEFAULT_POS  (KEXEC_IMAGE_NR + 0)
> +#define KEXEC_FLAG_CRASH_POS    (KEXEC_IMAGE_NR + 1)
> +#define KEXEC_FLAG_IN_PROGRESS  (KEXEC_IMAGE_NR + 2)
> +#define KEXEC_FLAG_IN_HYPERCALL (KEXEC_IMAGE_NR + 3)

I do not think that you need change all of this right now.
Just add one space after KEXEC_FLAG_IN_HYPERCALL and you
are set.

>  static unsigned long kexec_flags = 0; /* the lowest bits are for 
> KEXEC_IMAGE... */
>
> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op,
>      if ( ret )
>          return ret;
>
> +    if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, &kexec_flags) )
> +        return hypercall_create_continuation(__HYPERVISOR_kexec_op, "lh", 
> op, uarg);
> +

I would suggest here:
       ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, &kexec_flags));

...

Daniel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.