Xen project Mailing List

Re: xen 4.14.3 incorrect (~3x) cpu frequency reported

Date: Mon, 10 Jan 2022 18:04:13 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=D/qz+VDnnOiM+8CL3hmDweRilidN9IzU+w4L9mzC25k=; b=OvvM+jSw5spAZ9OsKXR4df95sj2qmXc5gCnruIP+BufvXmGi7ZT4c2f/+IxMEUM5ijvbI8eX+znrOzTrUmWuE39Xsq0ESF3D7vA78dE3sZ2EYdNbsEMWi1k8udniLkjEvta35ki2hQjt8xAmrUdbfaPUFlm8LwYHy218k657I40m8n8fxrE5TZNnei2NIManSd0GP041/cIG8ir6MM1A5amNL7Pyc6Qp3sW37jglT0JsU8SX4r1RO706evRkR2H2+ZuA29sj5A+1L6gw3yXXm1sIkZwQSdlYlSQwT+wfBUE6T8K5LsMSCM0k+r8FfPJJXKZNttjcz7RipHLJVqlwYw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GWGVVDqap70XiDQxTW4nEKMDnr8kL4ZtqYNGiw8kUB9oqUA8qI0p66abnkSKipWt8fZT63GpLUkM3prF8Hr7kBDp/4dSG0OYUGj+hQlq4hCY00fKbP+mjsuLtt8NNPZPEF4b9jSUXiNCNge3h+f+xbl7TU/AGIdo8RALII+6Kn/9HjBEdSl0UCfESylx0SOGdKhG76Q6pgnFCXV5BhazqIT/niFZhnaI/21AmmHKLIPTosjhMOdr6uwJm2LE8wV5NKVUrYDCi/bxuUdKYWkUZXvosgiR65RnSs7GaeEf4qYa47ecNHXvGsGsGJQvkT9tEvHHuPh3BV7wgb6CqQ/HnA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: James Dingwall <james-xen@xxxxxxxxxxxxxx>, alexander.rossa@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 10 Jan 2022 17:04:42 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 10.01.2022 16:43, Andrew Cooper wrote: > On 10/01/2022 13:11, Jan Beulich wrote: >> On 10.01.2022 13:37, Roger Pau Monné wrote: >>> On Fri, Jan 07, 2022 at 12:39:04PM +0100, Jan Beulich wrote: >>>> @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p >>>> >>>> pts->frequency = hpet_rate; >>>> >>>> +for(i = 0; i < 16; ++i) {//temp >>>> count = hpet_read32(HPET_COUNTER); >>>> start = rdtsc_ordered(); >>>> target = count + CALIBRATE_VALUE(hpet_rate); >>>> if ( target < count ) >>>> while ( hpet_read32(HPET_COUNTER) >= count ) >>>> continue; >>>> - while ( hpet_read32(HPET_COUNTER) < target ) >>>> + while ( (count = hpet_read32(HPET_COUNTER)) < target ) >>>> continue; >>>> >>>> - return (rdtsc_ordered() - start) * CALIBRATE_FRAC; >>>> + expired = rdtsc_ordered() - start; >>> There's also a window between the HPET read and the TSC read where an >>> SMI/NMI could cause a wrong frequency detection. >> Right, as said in an earlier reply I did notice this myself (actually >> on the way home on Friday). As also said there, for now I can't see >> any real (i.e. complete) solution to this and the similar instances >> of the issue elsewhere. > > Calibration loops like this cannot be made robust. This is specifically > why frequency information is being made available via architectural > means, and is available via non-architectural means in other cases. > > There's a whole bunch of situations (#STOPCLK, TERM and PSMI off the top > of my head, but I bet HDC will screw with things too) which will mess > with any calibration loop, even if you could disable SMIs. While, > mechanically, we can disable SMIs on AMD with CLGI, we absolutely > shouldn't run a calibration loop like this with SMIs disabled. > > > Much as I hate to suggest it, we should parse the frequency out of the > long CPUID string, and compare it to the calibration time. (This > technique is mandated in the Intel BWG during very early setup.) This, according to some initial checking, might work for Intel CPUs, but apparently won't work for AMD ones (and I don't dare to think of what might happen if we run under, say, qemu). Imo we'd need to deal gracefully with the case that we can't parse the frequency out of that string, with "gracefully" including that our calibration still won't be too far off. Also I wonder if we wouldn't better prefer CPUID leaf 0x15 / 0x16 data over parsing extended leaf. > If it is different by a large margin, rerun the calibration, and if it > is still different, complain loudly into the logs. This will fix a > one-off-spurious event, whereas if e.g. the system is thermally > throttling due to a bad heat sync, there is nothing software can do to > recover the system. I was already considering to use reference data like that from CPUID leaves 0x15 / 0x16, but I couldn't really settle on what "large margin" would really want to be. Imo we should try to correct not- just-as-large errors as well, if we can. For the moment I continue to consider my plan (outlined in another reply on this thread) superior to what you suggest, but I'll be looking forward to further replies from you or others. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.