|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 1/2] kexec: use hypercall_create_continuation to protect KEXEC ops
On Wed, Apr 19, 2017 at 10:47:15AM -0500, Eric DeVolder wrote:
> When we concurrently try to unload and load crash
> images we eventually get:
>
> Xen call trace:
> [<ffff82d08018b04f>] machine_kexec_add_page+0x3a0/0x3fa
> [<ffff82d08018b184>] machine_kexec_load+0xdb/0x107
> [<ffff82d080116e8d>] kexec.c#kexec_load_slot+0x11/0x42
> [<ffff82d08011724f>] kexec.c#kexec_load+0x119/0x150
> [<ffff82d080117c1e>] kexec.c#do_kexec_op_internal+0xab/0xcf
> [<ffff82d080117c60>] do_kexec_op+0xe/0x1e
> [<ffff82d08025c620>] pv_hypercall+0x20a/0x44a
> [<ffff82d080260116>] cpufreq.c#test_all_events+0/0x30
>
> Pagetable walk from ffff820040088320:
> L4[0x104] = 00000002979d1063 ffffffffffffffff
> L3[0x001] = 00000002979d0063 ffffffffffffffff
> L2[0x000] = 00000002979c7063 ffffffffffffffff
> L1[0x088] = 80037a91ede97063 ffffffffffffffff
>
> The interesting thing is that the page bits (063) look legit.
>
> The operation on which we blow up is us trying to write
> in the L1 and finding that the L2 entry points to some
> bizzare MFN. It stinks of a race, and it looks like
> the issue is due to no concurrency locks when dealing
> with the crash kernel space.
>
> Specifically we concurrently call kimage_alloc_crash_control_page
> which iterates over the kexec_crash_area.start -> kexec_crash_area.size
> and once found:
>
> if ( page )
> {
> image->next_crash_page = hole_end;
> clear_domain_page(_mfn(page_to_mfn(page)));
> }
>
> clears. Since the parameters of what MFN to use are provided
> by the callers (and the area to search is bounded) the the 'page'
> is probably the same. So #1 we concurrently clear the
> 'control_code_page'.
>
> The next step is us passing this 'control_code_page' to
> machine_kexec_add_page. This function requires the MFNs:
> page_to_maddr(image->control_code_page).
>
> And this would always return the same virtual address, as
> the MFN of the control_code_page is inside of the
> kexec_crash_area.start -> kexec_crash_area.size area.
>
> Then machine_kexec_add_page updates the L1 .. which can be done
> concurrently and on subsequent calls we mangle it up.
>
> This is all a theory at this time, but testing reveals
> that adding the hypercall_create_continuation() at the
> kexec hypercall fixes the crash.
>
> NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot
> when unloading kexec image) to prevent crashes during
> simultaneous load/unloads.
>
> NOTE: Consideration was given to using the existing flag
> KEXEC_FLAG_IN_PROGRESS to denote a kexec hypercall in
> progress. This, however, overloads the original intent of
> the flag which is to denote that we are about-to/have made
> the jump to the crash path. The overloading would lead to
> failures in existing checks on this flag as the flag would
> always be set at the top level in do_kexec_op_internal().
> For this reason, the new flag KEXEC_FLAG_HC_IN_PROGRESS
> was introduced.
>
> While at it, fixed the #define mismatched spacing
>
> Signed-off-by: Eric DeVolder <eric.devolder@xxxxxxxxxx>
> Reviewed-by: Bhavesh Davda <bhavesh.davda@xxxxxxxxxx>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Daniel Kiper <daniel.kiper@xxxxxxxxxx>
> ---
> v3:
> - Incorporated feedback from Jan Beulich to change the
> name of the new flag to KEXEC_FLAG_IN_HYPERCALL
>
> v2: 04/17/2017
> - Patch titled 'kexec: use hypercall_create_continuation to protect KEXEC
> ops'
> - Jan Beulich directed me to use a continuation method instead
> of spinlock.
> - Incorporated feedback from Daniel Kiper <daniel.kiper@xxxxxxxxxx>
> - Incorporated feedback from Konrad Wilk <konrad.wilk@xxxxxxxxxx>
>
> v1: 04/10/2017
> - Patch titled 'kexec: Add spinlock for the whole hypercall'
> - Used spinlock in do_kexec_op_internal()
> ---
> xen/common/kexec.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/xen/common/kexec.c b/xen/common/kexec.c
> index 072cc8e..253c204 100644
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -50,9 +50,10 @@ static cpumask_t crash_saved_cpus;
>
> static struct kexec_image *kexec_image[KEXEC_IMAGE_NR];
>
> -#define KEXEC_FLAG_DEFAULT_POS (KEXEC_IMAGE_NR + 0)
> -#define KEXEC_FLAG_CRASH_POS (KEXEC_IMAGE_NR + 1)
> -#define KEXEC_FLAG_IN_PROGRESS (KEXEC_IMAGE_NR + 2)
> +#define KEXEC_FLAG_DEFAULT_POS (KEXEC_IMAGE_NR + 0)
> +#define KEXEC_FLAG_CRASH_POS (KEXEC_IMAGE_NR + 1)
> +#define KEXEC_FLAG_IN_PROGRESS (KEXEC_IMAGE_NR + 2)
> +#define KEXEC_FLAG_IN_HYPERCALL (KEXEC_IMAGE_NR + 3)
I do not think that you need change all of this right now.
Just add one space after KEXEC_FLAG_IN_HYPERCALL and you
are set.
> static unsigned long kexec_flags = 0; /* the lowest bits are for
> KEXEC_IMAGE... */
>
> @@ -1193,6 +1194,9 @@ static int do_kexec_op_internal(unsigned long op,
> if ( ret )
> return ret;
>
> + if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, &kexec_flags) )
> + return hypercall_create_continuation(__HYPERVISOR_kexec_op, "lh",
> op, uarg);
> +
I would suggest here:
ASSERT(test_bit(KEXEC_FLAG_IN_HYPERCALL, &kexec_flags));
...
Daniel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |