[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen on ARM IRQ latency and scheduler overhead



On 18/02/17 00:41, Stefano Stabellini wrote:
> On Fri, 17 Feb 2017, Dario Faggioli wrote:
>> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
>>> These are the results, in nanosec:
>>>
>>>                         AVG     MIN     MAX     WARM MAX
>>>
>>> NODEBUG no WFI          1890    1800    3170    2070
>>> NODEBUG WFI             4850    4810    7030    4980
>>> NODEBUG no WFI credit2  2217    2090    3420    2650
>>> NODEBUG WFI credit2     8080    7890    10320   8300
>>>
>>> DEBUG no WFI            2252    2080    3320    2650
>>> DEBUG WFI               6500    6140    8520    8130
>>> DEBUG WFI, credit2      8050    7870    10680   8450
>>>
>>> As you can see, depending on whether the guest issues a WFI or not
>>> while
>>> waiting for interrupts, the results change significantly.
>>> Interestingly,
>>> credit2 does worse than credit1 in this area.
>>>
>> I did some measuring myself, on x86, with different tools. So,
>> cyclictest is basically something very very similar to the app
>> Stefano's app.
>>
>> I've run it both within Dom0, and inside a guest. I also run a Xen
>> build (in this case, only inside of the guest).
>>
>>> We are down to 2000-3000ns. Then, I started investigating the
>>> scheduler.
>>> I measured how long it takes to run "vcpu_unblock": 1050ns, which is
>>> significant. I don't know what is causing the remaining 1000-2000ns,
>>> but
>>> I bet on another scheduler function. Do you have any suggestions on
>>> which one?
>>>
>> So, vcpu_unblock() calls vcpu_wake(), which then invokes the
>> scheduler's wakeup related functions.
>>
>> If you time vcpu_unblock(), from beginning to end of the function, you
>> actually capture quite a few things. E.g., the scheduler lock is taken
>> inside vcpu_wake(), so you're basically including time spent waited on
>> the lock in the estimation.
>>
>> That is probably ok (as in, lock contention definitely is something
>> relevant to latency), but it is expected for things to be rather
>> different between Credit1 and Credit2.
>>
>> I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule),
>> and here's the result. Numbers are in cycles (I've used RDTSC) and, for
>> making sure to obtain consistent and comparable numbers, I've set the
>> frequency scaling governor to performance.
>>
>> Dom0, [performance]                                                  
>>              cyclictest 1us  cyclictest 1ms  cyclictest 100ms                
>>         
>> (cycles)     Credit1 Credit2 Credit1 Credit2 Credit1 Credit2         
>> wakeup-avg   2429    2035    1980    1633    2535    1979            
>> wakeup-max   14577   113682  15153   203136  12285   115164          
> 
> I am not that familiar with the x86 side of things, but the 113682 and
> 203136 look worrisome, especially considering that credit1 doesn't have
> them.

Dario,

Do you reckon those 'MAX' values could be the load balancer running
(both for credit1 and credit2)?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.