|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 02/18] x86/domain: limit window where curr_vcpu != current on context switch
On 08.01.2025 15:26, Roger Pau Monne wrote:
> On x86 Xen will perform lazy context switches to the idle vCPU, where the
> previously running vCPU context is not overwritten, and only current is
> updated
> to point to the idle vCPU. The state is then disjunct between current and
> curr_vcpu: current points to the idle vCPU, while curr_vcpu points to the vCPU
> whose context is loaded on the pCPU.
>
> While on that lazy context switched state, certain calls (like
> map_domain_page()) will trigger a full synchronization of the pCPU state by
> forcing a context switch. Note however how calling any of such functions
> inside the context switch code itself is very likely to trigger an infinite
> recursion loop.
>
> Attempt to limit the window where curr_vcpu != current in the context switch
> code, as to prevent and infinite recursion loop around sync_local_execstate().
>
> This is required for using map_domain_page() in the vCPU context switch code,
> otherwise using map_domain_page() in that context ends up in a recursive
> sync_local_execstate() loop:
Question is whether it's a good idea in the first place to start using
map_domain_page() from the context switch path. Surely there are possible
alternatives.
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1982,16 +1982,16 @@ static void load_default_gdt(unsigned int cpu)
> per_cpu(full_gdt_loaded, cpu) = false;
> }
>
> -static void __context_switch(void)
> +static void __context_switch(struct vcpu *n)
> {
> struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
> unsigned int cpu = smp_processor_id();
> struct vcpu *p = per_cpu(curr_vcpu, cpu);
> - struct vcpu *n = current;
> struct domain *pd = p->domain, *nd = n->domain;
>
> ASSERT(p != n);
> ASSERT(!vcpu_cpu_dirty(n));
> + ASSERT(p == current);
>
> if ( !is_idle_domain(pd) )
> {
> @@ -2036,6 +2036,18 @@ static void __context_switch(void)
>
> write_ptbase(n);
>
> + /*
> + * It's relevant to set both current and curr_vcpu back-to-back, to
> avoid a
> + * window where calls to mapcache_current_vcpu() during the context
> switch
> + * could trigger a recursive loop.
> + *
> + * Do the current switch immediately after switching to the new guest
> + * page-tables, so that current is (almost) always in sync with the
> + * currently loaded page-tables.
> + */
> + set_current(n);
> + per_cpu(curr_vcpu, cpu) = n;
The latter paragraph of the comment states something that so far wasn't
intended,
and imo also shouldn't be going forward. It's curr_vcpu which wants to be in
sync
with the loaded page tables. (Whether pulling ahead its updating is okay is a
separate question. All of these actions used to be be very carefully placed they
way they are. Which isn't to say that I can exclude things having gone stale
...)
And yes, that has always meant that mapcache_current_vcpu()'s condition for
calling sync_local_execstate() was building upon the fact that it won't be
called
from context switching contexts.
Did you consider updating that condition (evaluating curr_cpu) instead?
> @@ -2048,8 +2060,6 @@ static void __context_switch(void)
> if ( pd != nd )
> cpumask_clear_cpu(cpu, pd->dirty_cpumask);
> write_atomic(&p->dirty_cpu, VCPU_CPU_CLEAN);
> -
> - per_cpu(curr_vcpu, cpu) = n;
> }
>
> void context_switch(struct vcpu *prev, struct vcpu *next)
> @@ -2081,16 +2091,36 @@ void context_switch(struct vcpu *prev, struct vcpu
> *next)
>
> local_irq_disable();
>
> - set_current(next);
> -
> if ( (per_cpu(curr_vcpu, cpu) == next) ||
> (is_idle_domain(nextd) && cpu_online(cpu)) )
> {
> + /*
> + * Lazy context switch to the idle vCPU, set current == idle. Full
> + * context switch happens if/when sync_local_execstate() is called.
> + */
> + set_current(next);
> local_irq_enable();
The comment is misleading as far as the first half of the if() condition goes:
No further switching is going to happen in that case, aiui.
> }
> else
> {
> - __context_switch();
> + /*
> + * curr_vcpu will always point to the currently loaded vCPU context,
> as
> + * it's not updated when doing a lazy switch to the idle vCPU.
> + */
> + struct vcpu *prev_ctx = per_cpu(curr_vcpu, cpu);
> +
> + if ( prev_ctx != current )
> + {
> + /*
> + * Doing a full context switch to a non-idle vCPU from a lazy
> + * context switched state. Adjust current to point to the
> + * currently loaded vCPU context.
> + */
> + ASSERT(current == idle_vcpu[cpu]);
> + ASSERT(!is_idle_vcpu(next));
> + set_current(prev_ctx);
This feels wrong, as in "current" then not representing what it should
represent,
for a certain time window. I may be dense, but neither comment not description
clarify to me why this might be needed. I can see that it's needed to please the
ASSERT() you add to __context_switch(), yet then I might ask why that assertion
is put there.
> + }
> + __context_switch(next);
>
> /* Re-enable interrupts before restoring state which may fault. */
> local_irq_enable();
> @@ -2156,15 +2186,23 @@ int __sync_local_execstate(void)
> {
> unsigned long flags;
> int switch_required;
> + unsigned int cpu = smp_processor_id();
> + struct vcpu *p;
>
> local_irq_save(flags);
>
> - switch_required = (this_cpu(curr_vcpu) != current);
> + p = per_cpu(curr_vcpu, cpu);
> + switch_required = (p != current);
>
> if ( switch_required )
> {
> - ASSERT(current == idle_vcpu[smp_processor_id()]);
> - __context_switch();
> + ASSERT(current == idle_vcpu[cpu]);
> + /*
> + * Restore current to the previously running vCPU, __context_switch()
> + * will update current together with curr_vcpu.
> + */
> + set_current(p);
Similarly here.
> + __context_switch(idle_vcpu[cpu]);
> }
>
> local_irq_restore(flags);
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2232,8 +2232,6 @@ void __init trap_init(void)
>
> void activate_debugregs(const struct vcpu *curr)
> {
> - ASSERT(curr == current);
> -
> write_debugreg(0, curr->arch.dr[0]);
> write_debugreg(1, curr->arch.dr[1]);
> write_debugreg(2, curr->arch.dr[2]);
Why would this assertion go away? If it suddenly triggers, the parameter name
would now end up being wrong.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |