[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vpmu: add cpu hot unplug notifier for vpmu



>>> On 17.05.17 at 16:32, <boris.ostrovsky@xxxxxxxxxx> wrote:
> On 05/17/2017 10:09 AM, Jan Beulich wrote:
>>>>> On 17.05.17 at 15:58, <boris.ostrovsky@xxxxxxxxxx> wrote:
>>> On 05/17/2017 08:54 AM, Jan Beulich wrote:
>>>>>>> On 17.05.17 at 14:40, <luwei.kang@xxxxxxxxx> wrote:
>>>>>>>>> On 16.05.17 at 19:29, <luwei.kang@xxxxxxxxx> wrote:
>>>>>>> Currently, hot unplug a cpu with vpmu enabled may cause system hang
>>>>>>> due to send IPI to a die physical cpu. This patch add a cpu hot unplug
>>>>>>> notifer to save vpmu context before cpu offline.
>>>>>>>
>>>>>>> Consider one scene, hotplug physical cpu N with vpmu is enabled.
>>>>>> I think you mean "scenario" and "hot unplug".
>>>>>>
>>>>>>> The vcpu which running on this physical cpu before will be switch to
>>>>>>> other online cpu. Before load the vpmu context to new physical cpu, a
>>>>>>> IPI will be send to cpu N to save the vpmu context.
>>>>>>> System will hang in function on_select_cpus because of that physical
>>>>>>> cpu is offline and can not do any response.
>>>>>> Doesn't this make clear that you would better also make sure
>>>>>> ->last_pcpu doesn't hold to the then stale CPU anymore? For
>>>>>> example, vpmu_load() compares it with smp_processor_id() (the subsequent 
>>>>>> use 
> 
>>>>> is guarded by a VPMU_CONTEXT_LOADED flag
>>>>>> check), allowing badness if the same or another CPU with the same number 
>>>>> comes up again quickly enough. Similarly
>>>>>> vpmu_arch_destroy() uses it without checking VPMU_CONTEXT_LOADED.
>>>>>     I think it may can't make sure  "->last_pcpu" doesn't hold to the 
>>>>> then 
>>>>> stale CPU. The purpose of this notifier is to save the vpmu context 
>>>>> before 
>>>>> cpu offline. Avoid save vpmu context by send IPI to that offline cpu. 
>>>>> There 
>>>>> is no reason to change the value except it saving (vpmu_save()) in 
>>>>> another 
>>>>> physical cpu.
>>>> I'm afraid I don't understand most of your reply.
>>>>
>>>>>     Regarding vpmu_arch_destroy(), it indeed will cause same issue. What 
>>>>> about add " this_cpu(cpu) = NULL" in cpu_callback() to clean the 
>>>>> last_vcpu 
>>>>> pointer of this physical cpu. 
>>>> That's being done by vpmu_save_force() already afaict (assuming
>>>> you mean this_cpu(last_vcpu)), albeit for whatever reason open
>>>> coding this_cpu().
>>>>
>>>>>     In addition, add VPMU_CONTEXT_LOADED check before execute 
>>>>> on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), vpmu_save_force, v, 
>>>>> 1) 
> in 
>>>>> vpmu_arch_destroy(). Because of force save operation has been finished in 
>>>>> notifier function.
>>>> I'm not sure whether that would be correct. Boris?
>>>
>>> I believe we still have a race with vpmu_load(): it can be past
>>> VPMU_CONTEXT_LOADED test and committed to the remote call when the
>>> remote VCPU becomes offlined.
>> The offlined entity is a pCPU, and such offlining happens in stop-
>> machine context iirc.
> 
> Oh, then I think this should work --- remote calls are predicated on
> VPMU_CONTEXT_LOADED being set and the callback will clear it.
> 
> Nevertheless, I'd still make sure that last_pcpu doesn't point to an
> offlined processor.

Well, we can't reasonably use some random online one; my
intention rather was to write some visibly invalid value there,
so that comparing with any possibly online CPU would always
produce false.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.