[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen: add hypercall option to temporarily pin a vcpu



>>> On 25.02.16 at 17:50, <JGross@xxxxxxxx> wrote:
> @@ -670,7 +676,13 @@ int cpu_disable_scheduler(unsigned int cpu)
>              if ( cpumask_empty(&online_affinity) &&
>                   cpumask_test_cpu(cpu, v->cpu_hard_affinity) )
>              {
> -                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
> +                if ( v->affinity_broken )
> +                {
> +                    /* The vcpu is temporarily pinned, can't move it. */
> +                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
> +                    ret = -EBUSY;
> +                    continue;
> +                }

So far the function can only return 0 or -EAGAIN. By using "continue"
here you will make it impossible for the caller to reliably determine
whether possibly both things failed. Despite -EBUSY being a logical
choice here, I think you'd better use -EAGAIN here too. And it needs
to be determined whether continuing the loop in this as well as the
pre-existing cases is actually the right thing to do.

> @@ -679,6 +691,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>                      v->affinity_broken = 1;
>                  }
>  
> +                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);

Wouldn't it be even better to make this the "else" to the
preceding if(), since in the suspend case this is otherwise going
to be printed for every vCPU not currently running on pCPU0?

> @@ -753,14 +767,22 @@ static int vcpu_set_affinity(
>      struct vcpu *v, const cpumask_t *affinity, cpumask_t *which)
>  {
>      spinlock_t *lock;
> +    int ret = 0;
>  
>      lock = vcpu_schedule_lock_irq(v);
>  
> -    cpumask_copy(which, affinity);
> +    if ( v->affinity_broken )
> +    {
> +        ret = -EBUSY;
> +    }

Unnecessary braces.

> @@ -979,6 +1001,53 @@ void watchdog_domain_destroy(struct domain *d)
>          kill_timer(&d->watchdog_timer[i]);
>  }
>  
> +static long do_pin_temp(int cpu)
> +{
> +    struct vcpu *v = current;
> +    spinlock_t *lock;
> +    long ret = -EINVAL;
> +
> +    lock = vcpu_schedule_lock_irq(v);
> +
> +    if ( cpu == -1 )
> +    {
> +        if ( v->affinity_broken )
> +        {
> +            cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
> +            v->affinity_broken = 0;
> +            set_bit(_VPF_migrating, &v->pause_flags);
> +            ret = 0;
> +        }
> +    }
> +    else if ( cpu < nr_cpu_ids && cpu >= 0 )

Perhaps easier to simply use "cpu < 0" in the first if()?

> +    {
> +        if ( v->affinity_broken )
> +        {
> +            ret = -EBUSY;
> +        }
> +        else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) )
> +        {

This is a rather ugly restriction: How would a caller fulfill its job
when this is not the case?

> @@ -1088,6 +1157,23 @@ ret_t do_sched_op(int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case SCHEDOP_pin_temp:
> +    {
> +        struct sched_pin_temp sched_pin_temp;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&sched_pin_temp, arg, 1) )
> +            break;
> +
> +        ret = xsm_schedop_pin_temp(XSM_PRIV);
> +        if ( ret )
> +            break;
> +
> +        ret = do_pin_temp(sched_pin_temp.pcpu);
> +
> +        break;
> +    }

So having come here I still don't see why this is called "temp":
Nothing enforces this to be a temporary state, and hence the
sub-op name currently is actively misleading.

> --- a/xen/include/public/sched.h
> +++ b/xen/include/public/sched.h
> @@ -118,6 +118,15 @@
>   * With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
>   */
>  #define SCHEDOP_watchdog    6
> +
> +/*
> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning.
> + * @arg == pointer to sched_pin_temp_t structure.
> + *
> + * Setting pcpu to -1 will undo a previous temporary pinning.
> + * This call is allowed for domains with domain control privilege only.
> + */

Why domain control privilege? I'd actually suggest limiting the
ability to the hardware domain, at once eliminating the need
for the XSM check.

> +struct sched_pin_temp {
> +    int pcpu;

Fixed width types only please in the public interface. Also this needs
an entry in xen/include/xlat.lst, and a consumer of the resulting
check macro.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.