[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen 4.14.3 incorrect (~3x) cpu frequency reported


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 10 Jan 2022 16:04:07 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WRvEJKiEpbzRxj6z+eZ4Dd4iXqVr1P9vFd67L4GQnYM=; b=grDCboWQcMgwguuLYa7Nkxxmn/4MflKe/hlVHUFJATdl1TGNt0jeDgnZXRYi2uFNmflu6uwiddNyFoOuEdo8KBRuVQYYl+t/C6QAQefvWAmVWmLK23npwz4sE/w0RPHBBpawRvAJnISwDkbLm7HlO+bcNetxVq5AAebviXXR0iFs2jmmklR2SJFNvwZ0Sg/tlkciL9p2Vw8J7D7MC/X4RQ6hVagjAsFtyKQAs/bd32Uktm+wvhxVCYDye56qcHHRT5REAxINJYn4zsIlfNvQ1hKOrx+HoOsiZFwq9YFyXBH8pRWeW2R39CrHyvEHkK2ucHtDtmZtpP6kikMeFUKigg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VZHH63PGfL2vv36zcpXfKYhQqW/OQvtOqzVCAkK2DG6FIL+cP3W3LtHQNV4Ygo6f/Kr4c2ZRgiwVnr5NGLYjd9aL0gFPLD2SwnFGdr6lYARTk+hMtTGRzjzY8XHI7VTodhtaRUa/ziPaMCLxROc9QRpTViI4TAVaGu7HoSq0fvdAeWUeemhDmpCNBU2FvD8ldSnWctZymYbAFbix2CU5nkLSIJOlXmfRHRbA/Fw0UOFuUe6LF1nfWgb+GGanL1MW+naevyIs0ZG+Z7MfUFEQQGSIAJCm+duXQTjmbYyZx/3K4XkQp862he0PzDF/U0mYCGxibpQ9plOPIcgTBhs5sw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: James Dingwall <james-xen@xxxxxxxxxxxxxx>, alexander.rossa@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Mon, 10 Jan 2022 15:04:26 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.01.2022 15:49, Roger Pau Monné wrote:
> On Mon, Jan 10, 2022 at 08:52:55AM +0100, Jan Beulich wrote:
>> On 07.01.2022 12:39, Jan Beulich wrote:
>>> --- a/xen/arch/x86/time.c
>>> +++ b/xen/arch/x86/time.c
>>> @@ -378,8 +378,9 @@ static u64 read_hpet_count(void)
>>>  
>>>  static int64_t __init init_hpet(struct platform_timesource *pts)
>>>  {
>>> -    uint64_t hpet_rate, start;
>>> +    uint64_t hpet_rate, start, expired;
>>>      uint32_t count, target;
>>> +unsigned int i;//temp
>>>  
>>>      if ( hpet_address && strcmp(opt_clocksource, pts->id) &&
>>>           cpuidle_using_deep_cstate() )
>>> @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p
>>>  
>>>      pts->frequency = hpet_rate;
>>>  
>>> +for(i = 0; i < 16; ++i) {//temp
>>>      count = hpet_read32(HPET_COUNTER);
>>>      start = rdtsc_ordered();
>>>      target = count + CALIBRATE_VALUE(hpet_rate);
>>>      if ( target < count )
>>>          while ( hpet_read32(HPET_COUNTER) >= count )
>>>              continue;
>>> -    while ( hpet_read32(HPET_COUNTER) < target )
>>> +    while ( (count = hpet_read32(HPET_COUNTER)) < target )
>>>          continue;
>>
>> Unlike I first thought but matching my earlier reply, this only reduces
>> the likelihood of encountering an issue. In particular, a long-duration
>> event ahead of the final HPET read above would be covered, but ...
>>
>>> -    return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
>>> +    expired = rdtsc_ordered() - start;
>>
>> ... such an event occurring between the final HPET read and the TSC
>> read would still be an issue. So far I've only been able to think of an
>> ugly way to further reduce likelihood for this window, but besides that
>> neither being neat nor excluding the possibility altogether, I have to
>> point out that we have the same issue in a number of other places:
>> Back-to-back reads of platform timer and TSC are assumed to happen
>> close together elsewhere as well.
> 
> Right, sorry replied to the patch first without reading this.

No problem at all.

>> Cc-ing other x86 maintainers to see whether they have any helpful
>> thoughts ...
> 
> I'm not sure there's much we can do, we could maybe count NMIs and
> retry if we detect an NMI has happened during calibration, but we
> can't do this for SMIs, as I don't think there's a way to get this
> information on all hardware we support. The MSR_SMI_COUNT (0x34) is
> Intel-only and requires Nehalem or later.

Yeah, no, I wouldn't want to make ourselves depend on such counting
anyway. There can always be yet another reason for long enough a
delay. The rough plan I have for further reducing the likelihood is
building on the assumption that there hopefully wouldn't be many
such events in close succession. I would read both counters perhaps
3 times, calculating (from the TSC alone) and recording the shortest
of the sequences. Then I'd continue reading both counters for as
long as the duration further shrinks (which will necessarily be a
finite process). For calculation I'd then use the tuple from the
fastest of the (4 or more) read sequences.

Thinking about it, maybe I should make this a separate patch rather
than folding that extra complexity into here (the patch intended
for staging now looks quite different anyway, partly thanks to the
fix for the issue you did point out).

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.