[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)



. snip..
> > + * Note that because of this NOP code the do_nmi is not safely patchable.
> > + * Also if we do receive 'real' NMIs we have lost them.
> 
> The MCE path needs consideration as well.  Unlike the NMI path however,
> that one cannot be ignored.
> 
> In both cases, it might be best to see about raising a tasklet or
> softirq to pick up some deferred work.

I will put that in a seperate patch as this is patch is big enough.

> 
> > + */
> > +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> > +{
> > +    return 1;
> > +}
> > +
> > +static void reschedule_fn(void *unused)
> > +{
> > +    smp_mb(); /* Synchronize with setting do_work */
> > +    raise_softirq(SCHEDULE_SOFTIRQ);
> 
> As you have to IPI each processor to raise a schedule softirq, you can
> set a per-cpu "xsplice enter rendezvous" variable.  This prevents the
> need for the return-to-guest path to poll one single byte.

.. Not sure I follow. The IPI we send to the other CPU is 0xfb - which
makes the smp_call_function_interrupt run, which calls this function:
reschedule_fn(). Then raise_softirq sets the bit on softirq_pending.

Great. Since we caused an IPI that means we ended up calling VMEXIT which
eventually ends calling process_pending_softirqs() which calls schedule().
And after that it calls check_for_xsplice_work().

Are you suggesting to add new softirq that would call in 
check_for_xsplice_work()?

Or are you suggesting to skip the softirq_pending check and all the
code around that and instead have each VMEXIT code path check this
per-cpu "xsplice enter" variable? If so, why not use the existing
softirq infrastructure? 

.. snip..
> 
> > +}
> > +
> > +void do_xsplice(void)
> > +{
> > +    struct payload *p = xsplice_work.data;
> > +    unsigned int cpu = smp_processor_id();
> > +
> > +    /* Fast path: no work to do. */
> > +    if ( likely(!xsplice_work.do_work) )
> > +        return;
> > +    ASSERT(local_irq_is_enabled());
> > +
> > +    /* Set at -1, so will go up to num_online_cpus - 1 */
> > +    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
> > +    {
> > +        unsigned int total_cpus;
> > +
> > +        if ( !get_cpu_maps() )
> > +        {
> > +            printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps 
> > lock.\n",
> > +                   p->name, cpu);
> > +            xsplice_work.data->rc = -EBUSY;
> > +            xsplice_work.do_work = 0;
> > +            return;
> 
> This error path leaves a ref in the semaphore.

It does. And it also does so in xsplice_do_single() - if the xsplice_do_wait()
fails, 
> 
> > +        }
> > +
> > +        barrier(); /* MUST do it after get_cpu_maps. */
> > +        total_cpus = num_online_cpus() - 1;
> > +
> > +        if ( total_cpus )
> > +        {
> > +            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", 
> > p->name,
> > +                   cpu, total_cpus);
> > +            smp_call_function(reschedule_fn, NULL, 0);
> > +        }
> > +        (void)xsplice_do_single(total_cpus);

.. here, we never decrement the semaphore.

Which is a safe-guard (documenting that).

The issue here is that say we have two CPUs:

CPU0                            CPU1

semaphore=0                     semaphore=1
 !get_cpu_maps()
  do_work = 0;                  .. now goes in the 'slave' part below and exits 
out
                                as do_work=0

Now if we decremented the semaphore back on the error path:

CPU0                            CPU1

semaphore=0                     
 !get_cpu_maps()
                                .. do_work is still set.
  do_work = 0;                  
                   
  semaphore=-1
                                atomic_inc_and_test(semaphore) == 0
                                .. now it assumes the role of a master.

                                .. it will fail as the other CPU will never
                                renezvous (the do_work is set to zero).
                                But we waste another 30ms spinning.


The end result is that after patching the semaphore should equal
num_online_cpus-1.


> > +
> > +        ASSERT(local_irq_is_enabled());
> > +
> > +        put_cpu_maps();
> > +
> > +        printk(XENLOG_DEBUG "%s finished with rc=%d\n", p->name, p->rc);
> > +    }
> > +    else
> > +    {
> > +        /* Wait for all CPUs to rendezvous. */
> > +        while ( xsplice_work.do_work && !xsplice_work.ready )
> > +        {
> > +            cpu_relax();
> > +            smp_rmb();
> > +        }
> > +
> 
> What happens here if the rendezvous initiator times out?  Looks like we
> will spin forever waiting for do_work which will never drop back to 0.

Ross answered that, but the other code (master) will set do_work to zero so
we will exit this.

> 
> > +        /* Disable IRQs and signal. */
> > +        local_irq_disable();
> > +        atomic_inc(&xsplice_work.irq_semaphore);
> > +
> > +        /* Wait for patching to complete. */
> > +        while ( xsplice_work.do_work )

Ditto for this.
> > +        {
> > +            cpu_relax();
> > +            smp_rmb();
> > +        }
> > +        local_irq_enable();
> 
> Splitting the modification of do_work and ready across multiple
> functions makes it particularly hard to reason about the correctness of
> the rendezvous.  It would be better to have a xsplice_rendezvous()
> function whose purpose was to negotiate the rendezvous only, using local
> static state.  The action can then be just the switch() from
> xsplice_do_single().

The earlier code was like that but it ended up being quite
big. Let me make it happen and leave the actions in the xsplice_do_single()
(and rename it to xsplice_do_action().


> 
> > +    }
> > +}
> > +
> > diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
> > index a60587e..82aff35 100644
> > --- a/xen/include/asm-arm/nmi.h
> > +++ b/xen/include/asm-arm/nmi.h
> > @@ -4,6 +4,19 @@
> >  #define register_guest_nmi_callback(a)  (-ENOSYS)
> >  #define unregister_guest_nmi_callback() (-ENOSYS)
> >  
> > +typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
> > +
> > +/**
> > + * set_nmi_callback
> > + *
> > + * Set a handler for an NMI. Only one handler may be
> > + * set. Return the old nmi callback handler.
> > + */
> > +static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
> > +{
> > +    return NULL;
> > +}
> > +
> 
> This addition suggests that there should probably be an
> arch_xsplice_prepair_rendezvous() and arch_xsplice_finish_rendezvous().

Yes indeed.
> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.