[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen 4.14.3 incorrect (~3x) cpu frequency reported


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 10 Jan 2022 14:11:33 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7PMjNK2nW0KVCPWSS3zDDezwRSJ8z00zQCp6CWEwZuc=; b=BihCpKTNAk5UyjebVCT+5eUDtzWxSHzkxYKup5Q4tDkKbVbb5+hBJcrjjejI7bM93DKFIa+uviqilqirzBJZ8TG3aPRPTmY1WmAaN+QX8jFBun2T9D82CJU8IIxRnmMU3FWDNDUe2UEBEUTyrAIKIXA2KdlY04PhUnjz0W/ifnr/yfbrj/Nqgc8ctmDtvuvRY0unyJGRiiiaUMSFTdhMegpTYIynNfV3HeYprI5pP4jnrynJxul4i8AA4bXzVznpsGdIE9Bzn4fngVrLWJ4AUR+28iBLcewPlHWfZWAuc+yh7LP5hLGbPmI4Pxw2il1QCXc11HFg/1zt4WaEYR6ZyQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=K8lmXhjzShA9hoy7QpV7RGrT++07IO/F2iIZuir66N0qSDfbpsm5/yRpy2p1cpxBAPcB6VFg6o5oiXvTZEq/suwcrTPSAn22dIYq7cw1ufSsDIvrDzCXHfty1uG5xTkxDRshAczO++BpflkeWqDUrG9n/PIy283Tq2dA0bqfCPd3B6SSolG71cL2xe36s0qF1fIgsTnYr8SScMVWeilGzICdjsE0qKsmGoNDZ/oaJJuUphbmA2FuGmRUiQJ/iL/mEpXBDeOwVtFpSCpldkn1Zwh+ld3WdwFLSHtfkO2NVG8CyFG+QoC9sY4FX3G+YWd1VySQMmfmgQH3AuXw2u3T0Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: James Dingwall <james-xen@xxxxxxxxxxxxxx>, alexander.rossa@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 10 Jan 2022 13:11:51 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.01.2022 13:37, Roger Pau Monné wrote:
> On Fri, Jan 07, 2022 at 12:39:04PM +0100, Jan Beulich wrote:
>> x86: improve TSC / CPU freq calibration accuracy
>>
>> While the problem report was for extreme errors, even smaller ones would
>> better be avoided: The calculated period to run calibration loops over
>> can (and usually will) be shorter than the actual time elapsed between
>> first and last platform timer and TSC reads. Adjust values returned from
>> the init functions accordingly.
>>
>> On a Skylake system I've tested this on accuracy (using HPET) went from
>> detecting in some cases more than 220kHz too high a value to about
>> ±1kHz. On other systems the original error range was much smaller, with
>> less (in some cases only very little) improvement.
>>
>> Reported-by: James Dingwall <james-xen@xxxxxxxxxxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> TBD: Do we think we need to guard against the bizarre case of
>>      "target + count" overflowing (i.e. wrapping)?
> 
> I also wonder whether a value of target close enough to the wrapping
> point could cause the loop to stuck indefinitely, as count would
> overflow and thus the condition won't be meet.

Oh, good point. I guess I'll make another patch to deal with that;
you mentioning leaves me surprised we so far had no reports of such.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -378,8 +378,9 @@ static u64 read_hpet_count(void)
>>  
>>  static int64_t __init init_hpet(struct platform_timesource *pts)
>>  {
>> -    uint64_t hpet_rate, start;
>> +    uint64_t hpet_rate, start, expired;
> 
> Might be better named elapsed rather than expired?
> 
> It doesn't store the end of loop TSC value, but the delta between
> start and end.

I don't mind; I've renamed it, as the difference between the two in
this context isn't very clear to me anyway.

>> @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p
>>  
>>      pts->frequency = hpet_rate;
>>  
>> +for(i = 0; i < 16; ++i) {//temp
>>      count = hpet_read32(HPET_COUNTER);
>>      start = rdtsc_ordered();
>>      target = count + CALIBRATE_VALUE(hpet_rate);
>>      if ( target < count )
>>          while ( hpet_read32(HPET_COUNTER) >= count )
>>              continue;
>> -    while ( hpet_read32(HPET_COUNTER) < target )
>> +    while ( (count = hpet_read32(HPET_COUNTER)) < target )
>>          continue;
>>  
>> -    return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
>> +    expired = rdtsc_ordered() - start;
> 
> There's also a window between the HPET read and the TSC read where an
> SMI/NMI could cause a wrong frequency detection.

Right, as said in an earlier reply I did notice this myself (actually
on the way home on Friday). As also said there, for now I can't see
any real (i.e. complete) solution to this and the similar instances
of the issue elsewhere.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.