[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] remove blocked time accounting from xen "clockchip"



>>> On 21.12.11 at 14:53, Laszlo Ersek <lersek@xxxxxxxxxx> wrote:
> On 12/21/11 09:32, Jan Beulich wrote:
> 
>> Specifically, without further adjustments, on a 4:3 over-committed
>> system doing a kernel build, I'm seeing an over-accounting of
>> approximately 4%. I was able to reduce this to slightly above 1%
>> with below (experimental and not 32-bit compatible) change:
>>
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -3864,6 +3864,29 @@
>>      return ns;
>>   }
>>
>> +#ifndef CONFIG_XEN
>> +#define _cputime_to_cputime64 cputime_to_cputime64
>> +#else
>> +#define NS_PER_TICK (1000000000 / HZ)
>> +static DEFINE_PER_CPU(u64, stolen_snapshot);
>> +static DEFINE_PER_CPU(unsigned int, stolen_residual);
>> +
>> +static cputime64_t _cputime_to_cputime64(cputime_t t)
>> +{
>> +    //todo not entirely suitable for 32-bit
>> +    u64 s = this_cpu_read(runstate.time[RUNSTATE_runnable]);
>> +    unsigned long adj = div_u64_rem(s - this_cpu_read(stolen_snapshot)
>> +                                      + this_cpu_read(stolen_residual),
>> +                                    NS_PER_TICK,
>> +                                    &__get_cpu_var(stolen_residual));
>> +
>> +    this_cpu_write(stolen_snapshot, s);
>> +    t = cputime_sub(t, jiffies_to_cputime(adj));
>> +
>> +    return cputime_le(t, cputime_max) ? cputime_to_cputime64(t) : 0;
>> +}
>> +#endif
>> +
> 
> Why is this not entirely suitable for 32-bit? div_u64_rem() indeed 
> returns an u64, but for the truncation to do actual damage, the retval 
> (which is the number of ticks that was stolen from the VCPU) would have 
> to be greater than about 4*10^9, which seems improbable with 1 msec long 
> ticks. Also cputime_sub() only depends, in order to leave any underflow 
> detectable, on adj <= cputime_max (cca. 2*10^9). I'm likely looking in 
> the wrong place though.

No, this is not about overflows. The very first this_cpu_read() is what
is problematic on 64-bit - it reads from space updated by the hypervisor,
and hence reading in two halves may get interrupted by an update.
The solution is to use cmpxchg8b here, but I only just now put that
part together.

>>   /*
>>    * Account user cpu time to a process.
>>    * @p: the process that the cpu time gets accounted to
>> @@ -3882,7 +3905,7 @@
>>      account_group_user_time(p, cputime);
>>
>>      /* Add user time to cpustat. */
>> -    tmp = cputime_to_cputime64(cputime);
>> +    tmp = _cputime_to_cputime64(cputime);
>>      if (TASK_NICE(p)>  0)
>>              cpustat->nice = cputime64_add(cpustat->nice, tmp);
>>      else
>> @@ -3934,7 +3957,7 @@
>>   void __account_system_time(struct task_struct *p, cputime_t cputime,
>>                      cputime_t cputime_scaled, cputime64_t *target_cputime64)
>>   {
>> -    cputime64_t tmp = cputime_to_cputime64(cputime);
>> +    cputime64_t tmp = _cputime_to_cputime64(cputime);
>>
>>      /* Add system time to process. */
>>      p->stime = cputime_add(p->stime, cputime);
>> @@ -3996,7 +4019,7 @@
>>   void account_idle_time(cputime_t cputime)
>>   {
>>      struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat;
>> -    cputime64_t cputime64 = cputime_to_cputime64(cputime);
>> +    cputime64_t cputime64 = _cputime_to_cputime64(cputime);
>>      struct rq *rq = this_rq();
>>
>>      if (atomic_read(&rq->nr_iowait)>  0)
>> @@ -4033,7 +4056,7 @@
>>                                              struct rq *rq)
>>   {
>>      cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>> -    cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
>> +    cputime64_t tmp = _cputime_to_cputime64(cputime_one_jiffy);
>>      struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat;
>>
>>      if (irqtime_account_hi_update()) {
>>
>> I currently have no idea what the remain 1% could be attributed to,
>> so thoughts from others would certainly be welcome.
> 
> What about irqtime_account_process_tick() calling account_idle_time() 
> with "cputime_one_jiffy" -- it could more frequently trigger the 
> underflow in _cputime_to_cputime64(). In that case we might want to 
> decrease idle time (ie. account "negative" cputime against idle), but can't.

I don't think the underflow can actually happen, I just wanted to be
safe by checking for it. If the calculation underflowed, it would mean
that more time was stolen than was spent in the current state (e.g.
idle) prior to the adjustment, which ought to be impossible.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.