[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen 4.14.3 incorrect (~3x) cpu frequency reported


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 11 Jan 2022 08:09:53 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+Rg5aORLBy2VyxCSBoNlGxFi+WpuKQMtr44s6uJCrDc=; b=ZmYf5NiRz6Z8PAbQMHMMoapk8fg3aMuTaAS2rHPgqe9EAmkFuns14bXrhMtO6Cw8VxQnGPnO7wknmEFb7euiTSIV1+ToU01aoOeq7osDcdQmxZ+Zy0tHTaXxvk921QtBxgBF9JpZ39BzrIS311jDHeibb6MRNfG3t9FJV/wffxb9tsnAB9D4507B2mfYDjQi9OpIfiPXH3vh0nyCZkue6pylLB0syD/RWdYVo0lor2Ghmm1C5sZH90hHu840U1fI3pFRXw0LGrU6WXGfS5weOh3RvYyA0wMhL7W+pIv17whmGZY5AykV8kdzsr6pTfTkaYPy4znFwB/CFWaqMW6v0g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hYS8RRQvJHXzVPr6N55WjxIGO/QSfVg/qmX26Ml4zUsN2ksaKNIARl/9h7SluGlJgJw0nWCIqNZPlefBLhN0K8l5IzSh5M2SB+7sV7zv5FWOcPshee86O59yO74C42Bf2c/Cnliv2a++ce678yyqlKelUArJiIRpjFFUEVrGGZxawjI8ZdGoIFrve/WarRy0qKzUGLsvNigo/oCMu22VTgTSeTDTjgVIm6S3c44q+ZNyBRsndlFxoxRGxyuLawV9dt6TU5eOiZqYL4sUSTh1AegkBZfYp1RbFIpvAwJ/ZGB7lnH6+i4oEGObUGV4oTs5rNFDipLoJSd1IlifOu3uHw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: James Dingwall <james-xen@xxxxxxxxxxxxxx>, alexander.rossa@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <amc96@xxxxxxxx>
  • Delivery-date: Tue, 11 Jan 2022 07:10:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 11.01.2022 06:32, Juergen Gross wrote:
> On 10.01.22 18:04, Jan Beulich wrote:
>> On 10.01.2022 16:43, Andrew Cooper wrote:
>>> On 10/01/2022 13:11, Jan Beulich wrote:
>>>> On 10.01.2022 13:37, Roger Pau Monné wrote:
>>>>> On Fri, Jan 07, 2022 at 12:39:04PM +0100, Jan Beulich wrote:
>>>>>> @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p
>>>>>>   
>>>>>>       pts->frequency = hpet_rate;
>>>>>>   
>>>>>> +for(i = 0; i < 16; ++i) {//temp
>>>>>>       count = hpet_read32(HPET_COUNTER);
>>>>>>       start = rdtsc_ordered();
>>>>>>       target = count + CALIBRATE_VALUE(hpet_rate);
>>>>>>       if ( target < count )
>>>>>>           while ( hpet_read32(HPET_COUNTER) >= count )
>>>>>>               continue;
>>>>>> -    while ( hpet_read32(HPET_COUNTER) < target )
>>>>>> +    while ( (count = hpet_read32(HPET_COUNTER)) < target )
>>>>>>           continue;
>>>>>>   
>>>>>> -    return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
>>>>>> +    expired = rdtsc_ordered() - start;
>>>>> There's also a window between the HPET read and the TSC read where an
>>>>> SMI/NMI could cause a wrong frequency detection.
>>>> Right, as said in an earlier reply I did notice this myself (actually
>>>> on the way home on Friday). As also said there, for now I can't see
>>>> any real (i.e. complete) solution to this and the similar instances
>>>> of the issue elsewhere.
>>>
>>> Calibration loops like this cannot be made robust.  This is specifically
>>> why frequency information is being made available via architectural
>>> means, and is available via non-architectural means in other cases.
>>>
>>> There's a whole bunch of situations (#STOPCLK, TERM and PSMI off the top
>>> of my head, but I bet HDC will screw with things too) which will mess
>>> with any calibration loop, even if you could disable SMIs.  While,
>>> mechanically, we can disable SMIs on AMD with CLGI, we absolutely
>>> shouldn't run a calibration loop like this with SMIs disabled.
>>>
>>>
>>> Much as I hate to suggest it, we should parse the frequency out of the
>>> long CPUID string, and compare it to the calibration time.  (This
>>> technique is mandated in the Intel BWG during very early setup.)
>>
>> This, according to some initial checking, might work for Intel CPUs,
>> but apparently won't work for AMD ones (and I don't dare to think of
>> what might happen if we run under, say, qemu). Imo we'd need to deal
>> gracefully with the case that we can't parse the frequency out of
>> that string, with "gracefully" including that our calibration still
>> won't be too far off.
>>
>> Also I wonder if we wouldn't better prefer CPUID leaf 0x15 / 0x16
>> data over parsing extended leaf.
>>
>>> If it is different by a large margin, rerun the calibration, and if it
>>> is still different, complain loudly into the logs.  This will fix a
>>> one-off-spurious event, whereas if e.g. the system is thermally
>>> throttling due to a bad heat sync, there is nothing software can do to
>>> recover the system.
>>
>> I was already considering to use reference data like that from CPUID
>> leaves 0x15 / 0x16, but I couldn't really settle on what "large
>> margin" would really want to be. Imo we should try to correct not-
>> just-as-large errors as well, if we can.
>>
>> For the moment I continue to consider my plan (outlined in another
>> reply on this thread) superior to what you suggest, but I'll be
>> looking forward to further replies from you or others.
> 
> Didn't Andrew mention that SMIs can be detected to have happened via
> an SMI counter?

Yes, but that's again an Intel-only thing. Plus, as said elsewhere,
I don't think it can be considered sufficient to account for SMI and
NMI alone - there can be any number of other things causing delays.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.