[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpuidle causing Dom0 soft lockups



>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 11.02.10 18:01 >>>
>On 11/02/2010 14:44, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>
>> Other than with the global processed_system_time,
>> the per-CPU one may not get increased even if delta_cpu was close
>> to 3*NS_PER_TICK, due to the stolen/blocked accounting. Probably
>> this was not a problem so far because no code outside of
>> timer_interrupt() read this value - Keir, since I think you wrote that
>> logic originally, any word on this?
>
>What you say is true, as clearly it is currently implemented that way almost
>by design. I'm lost in the intricacies of your current discussion though, so
>not sure exactly why it's a problem, and how we should fix it?

First of all I don't think anything necessarily needs to be fixed in the
2.6.18 tree, as that one will never support >32 vCPU-s, and I don't
think the scalability issue we're talking about here is of concern there.

The problem we're trying to address is the contention on xtime_lock.
It is clear that there generally is no need for all CPUs in the system
to try to update to global time variables, so some filtering on the
number of CPUs concurrently trying to acquire xtime_lock is
reasonable. With any filtering done, there is however potential for
a CPU to see its local processed time ahead of the global one, but
while setting a single shot timer in the past (or very near future)
would guarantee that it would execute timer_interrupt() (almost)
right away, it does not guarantee that it would now be among
those CPUs that would try to acquire xtime_lock (i.e. the situation
wouldn't necessarily have improved after the interrupt was handled,
and hence an interrupt storm is possible).

Consequently, along with capping the timeout to be set in
stop_hz_timer() to jiffies+1, the timeout would also reasonably be
capped to per_cpu(processed_system_time, cpu) + NS_PER_TICK.
This in turn only makes sense is the per-CPU processed time is
accurate (i.e. within NS_PER_TICK from when the last timer
interrupt occurred). That however doesn't hold: Due to the stolen/
blocked calculations subtracting exact nanosecond values from
delta_cpu, but only adding tick granular values into per-CPU
processed_system_time, the error can accumulate up to a little
less than 3*NS_PER_TICK.

The supposed change would be to do only a single adjustment to
per-CPU processed_system_time (using the originally calculated
delta_cpu value). What I couldn't convince myself of so far was
that this wouldn't influence the stolen/blocked accounting (since
the delta_cpu calculated on the next timer interrupt would now
necessarily be different from the one calculated with the
current logic) - in particular the adjustments commented with
"clamp local-time progress" are what would appear to get used
more frequently with the thought of change.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.