[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen HPET improvement proposal

On 25/10/13 13:47, Jan Beulich wrote:
>> Independently of the HPET issues themselves, I have identified a race
>> condition in the mwait-idle routines where a cpu which is preparing to
>> sleep can arrange for another cpu to wake it up, and have that other cpu
>> wake it up before it has enabled its mwait trigger, meaning that it will
>> idle for an arbitrary length of time in mwait.  Realistically, the cpu
>> will be woken up by the time calibration rendezvous once a second, and
>> possibly by the watchdog NMI every half second.
> Which is an awfully long period of time... Looking forward to see
> further details on this.

The fix is fairly simple.  The mwait code must set up the trigger on its
mwait region before arranging to be woken up.  That way, if the other
cpu does wake up (early perhaps), it will activate the trigger, and we
will bounce straight back out of mwait rather than sleeping indefinitely.

Currently, there is a window between arranging to be woken up and
activating the mwait trigger where the other cpu might have already
written to the mwait region.

>> If there is not a free HPET, a cpu will need to share with another cpu. 
>> If this cpu can find another HPET which will fire at an appropriate
>> time, the cpu can merely ask for it to be woken up by the HPET owner
>> when the owner wakes up.  If all the HPETs are programmed to fire a
>> sufficient time into the future, one needs to be shortened.  The cpu
>> should choose the soonest HPET, add itself to the owner's list of other
>> pcpus to wake, and reprogram the HPET to fire sooner.  It should not
>> reprogram the HPET to point to itself.
> I think blindly looking for the one with the closest wakeup is not ideal:
> For one, on huge systems this requires you to scan through too many
> other CPUs. And taking NUMA aspects into consideration here would
> seem at the very least desirable too (i.e. prefer sharing with a CPU
> close to the one looking for a "partner").

I was actually thinking of just searching through the HPETs.  There are
typically far fewer hpet channels than cpus (the most hpet channels I
have encountered in our test lab is 8).  There is also a possibility of
maintaining some form of priority-structure, so the next-to-fire HPET is
trivial to identify.  (My concern here is of the overhead with
maintaining the priority structure).

I see your point about NUMA, and shall consider it as I am developing
the code (although I might end up with v1 doing the dumb thing first,
before turning towards NUMA optimisation).  The NUMA aspect plays the
other way round as well, with the (usually single) HPET being on the
southbridge/pch, therefore likely hanging off numa node 0.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.