[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 11/20] x86/VPMU: Interface for setting PMU mode and flags



>>> On 11.09.14 at 18:10, <boris.ostrovsky@xxxxxxxxxx> wrote:
> On 09/11/2014 10:59 AM, Jan Beulich wrote:
>>>>> On 11.09.14 at 16:12, <boris.ostrovsky@xxxxxxxxxx> wrote:
>>> On 09/11/2014 02:44 AM, Jan Beulich wrote:
>>>>>>> On 10.09.14 at 19:37, <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>> On 09/10/2014 11:05 AM, Jan Beulich wrote:
>>>>>>>>> On 04.09.14 at 05:41, <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>>>> + cont_wait:
>>>>>>> +    /*
>>>>>>> +     * Note that we may fail here if a CPU is hot-(un)plugged while we 
>>>>>>> are
>>>>>>> +     * waiting. We will then time out.
>>>>>>> +     */
>>>>>>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>>>>>>> +    {
>>>>>>> +        /* Give up after 5 seconds */
>>>>>>> +        if ( NOW() > start + SECONDS(5) )
>>>>>>> +        {
>>>>>>> +            printk(XENLOG_WARNING
>>>>>>> +                   "vpmu_force_context_switch: failed to sync\n");
>>>>>>> +            ret = -EBUSY;
>>>>>>> +            break;
>>>>>>> +        }
>>>>>>> +        cpu_relax();
>>>>>>> +        if ( hypercall_preempt_check() )
>>>>>>> +            return hypercall_create_continuation(
>>>>>>> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
>>>>>>> +    }
>>>>>> I wouldn't complain about this not being synchronized with CPU
>>>>>> hotplug if there wasn't this hypercall continuation and relatively
>>>>>> long timeout. Much of the state you latch in static variables will
>>>>>> cause this operation to time out if in between a CPU got brought
>>>>>> down.
>>>>> It seemed to me that if we were to correctly deal with CPU hotplug it
>>>>> would add a bit too much complexity to the code. So I felt that letting
>>>>> the operation timeout would be a better way out.
>>>> The please at least add a code comment making this explicit to
>>>> future readers.
>>> Is the comment above 'while' keyword not sufficient?
>> Oh, it is of course. Must have not scrolled back enough...
>>
>>>>>> And as already alluded to, all this looks rather fragile anyway,
>>>>>> even if I can't immediately spot any problems with it anymore.
>>>>> The continuation is really a carry-over from earlier patch version when
>>>>> I had double loops over domain and VCPUs to explicitly unload VPMUs. At
>>>>> that time Andrew pointed out that these loops may take really long time
>>>>> and so I added continuations.
>>>>>
>>>>> Now that I changed that after realizing that having each PCPU go through
>>>>> a context switch is sufficient perhaps I don't need it any longer. Is
>>>>> the worst case scenario of being stuck here for 5 seconds (chosen
>>>>> somewhat arbitrary) acceptable without continuation?
>>>> 5 seconds is _way_ too long for doing this without continuation.
>>> Then I am also adding back your other comment from this thread
>>>
>>>   > > +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) 
>>> arg)
>>>   > > +{
>>>   > > +    int ret = -EINVAL;
>>>   > > +    xen_pmu_params_t pmu_params;
>>>   > > +
>>>   > > +    switch ( op )
>>>   > > +    {
>>>   > > +    case XENPMU_mode_set:
>>>   > > +    {
>>>   > > +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>>>   > > +        uint32_t current_mode;
>>>   > > +
>>>   > > +        if ( !is_control_domain(current->domain) )
>>>   > > +            return -EPERM;
>>>   > > +
>>>   > > +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>>   > > +            return -EFAULT;
>>>   > > +
>>>   > > +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>>>   > > +            return -EINVAL;
>>>   > > +
>>>   > > +        /*
>>>   > > +         * Return error is someone else is in the middle of changing 
> mode ---
>>>   > > +         * this is most likely indication of two system 
>>> administrators
>>>   > > +         * working against each other
>>>   > > +         */
>>>   > > +        if ( !spin_trylock(&xenpmu_mode_lock) )
>>>   > > +            return -EAGAIN;
>>>   >
>>>   > So what happens if you can't take the lock in a continuation? If
>>>   > returning -EAGAIN in that case is not a problem, what do you
>>>   > need the continuation for in the first place?
>>>
>>> EAGAIN this case means that the caller was not able to initiate the
>>> operation. Continuation will allow the caller to finish operation in
>>> progress.
>> But that's only what you want, not what the code does. Also now
>> that I look again I don't think the comment really applies to this if().
> 
> Oh, I see. Then both first and second will fail.
> 
> I can make the second caller reset everything so that when continuation 
> gets to run it will start anew. And if it (i.e. the first caller) did 
> get -EAGAIN while trying to get the lock then it's just as well --- the 
> state will be clean when user tries this again.
> 
> As for the question why continuation is needed in the firs place --- 
> it's to make sure this hypercall doesn't prevent other unrelated 
> operations from executing. Not to manage simultaneous execution of this 
> hypercall from multiple VCPUs (if this is what you were asking).

No, that's not what I was asking. The point I'm trying to make is - if
the caller is in need of dealing with -EAGAIN anyway (i.e. you
require it to retry), why can't you simply return -EAGAIN also for
the case where you currently use a continuation?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.