Xen project Mailing List

Re: xen 4.14.3 incorrect (~3x) cpu frequency reported

To: James Dingwall <james-xen@xxxxxxxxxxxxxx>

Date: Fri, 7 Jan 2022 12:39:04 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xH358gyXzX8lozl4zMBj0IKdgwO2qbrmtnYPx4puQX0=; b=ggZsdx9XH3lOUNYHnm8VIYG8NfVhZc14ohbKX6fPQ9IuyqdVsMNRuQ+Oq6uqdQmCaBnIaFsJM2hWBcnWwsZrURXbeHHUNNyHcfNiLc10YrpVvqZcGGAOqYNgms4B7HruxKIt4qBobUtAdN/ozEFSABcuWbLjEPqduowC6KNOC0+zvGnVCd1oGbI5S6nlikYuuvBOgQ0XWPfg4/iYitOmx7/1Dx/aTnu7Yay/UhtQmqqiT5dY8u4Nrd3L5Vyl7VwebmrQXACHmdzsVgfppqlEte4Eq5ZKZ6cQ43Mr6LPSBjTcJzR4CbtKgTrqFuxfgyvPRXghpexhkAe73SgOu/hP4A==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EF0lzXNugcM4ZRROG8bToeEfSS7TlQ7CToCxK6rfOhXJQe1Cep+I7bJmYFl6ojS1466utE+GzS64PsmJJ3NWIkIxX+Ftax+C6Z+2zd16b06IRsJ9XLjac+S1XPV4feZFdoUCk+gY3oY8sDwxGcj1gF8hONcPg7dRboC4+wIEVJPZ19ydic8WfoPIXCqB/JmbZHmwptiCPokXLnNRuNaN+KIEoP6z6rAzJhg62L1MdPJFV3uXXGR2hHfMfGPRxekPPmm7T+gXidCbuTo/mqriCs5JAchWCSMpqUoU3wWYTuUD6DBTRV4IlPaN54hkTQsHLcgoPbUydR47QMSl6njM4w==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: alexander.rossa@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Fri, 07 Jan 2022 11:39:32 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.01.2022 16:08, James Dingwall wrote: >>> On Wed, Jul 21, 2021 at 12:59:11PM +0200, Jan Beulich wrote: >>> >>>> On 21.07.2021 11:29, James Dingwall wrote: >>>> >>>>> We have a system which intermittently starts up and reports an incorrect >>>>> cpu frequency: > ... >>> I'm sorry to ask, but have you got around to actually doing that? Or >>> else is resolving this no longer of interest? > > We have experienced an occurence of this issue on 4.14.3 with 'loglvl=all' > present on the xen command line. I have attached the 'xl dmesg' output for > the fast MHz boot, the diff from the normal case is small so I've not added > that log separately: > > --- normal-mhz/xl-dmesg.txt 2022-01-06 14:13:47.231465234 +0000 > +++ funny-mhz/xl-dmesg.txt 2022-01-06 13:45:43.825148510 +0000 > @@ -211,7 +211,7 @@ > (XEN) cap enforcement granularity: 10ms > (XEN) load tracking window length 1073741824 ns > (XEN) Platform timer is 24.000MHz HPET > -(XEN) Detected 2294.639 MHz processor. > +(XEN) Detected 7623.412 MHz processor. > (XEN) EFI memory map: > (XEN) 0000000000000-0000000007fff type=3 attr=000000000000000f > (XEN) 0000000008000-000000003cfff type=7 attr=000000000000000f Below is a patch (suitably adjusted for 4.14.3) which I would hope can take care of the issue (assuming my vague guess on the reasons wasn't entirely off). It has some debugging code intentionally left in, and it's also not complete yet (other timer code needing similar adjustment). Given the improvements I've observed independent of your issue, I may not wait with submission until getting feedback from you, since - aiui - it may take some time for you to actually run into a case where the change would actually make an observable difference. Jan x86: improve TSC / CPU freq calibration accuracy While the problem report was for extreme errors, even smaller ones would better be avoided: The calculated period to run calibration loops over can (and usually will) be shorter than the actual time elapsed between first and last platform timer and TSC reads. Adjust values returned from the init functions accordingly. On a Skylake system I've tested this on accuracy (using HPET) went from detecting in some cases more than 220kHz too high a value to about ±1kHz. On other systems the original error range was much smaller, with less (in some cases only very little) improvement. Reported-by: James Dingwall <james-xen@xxxxxxxxxxxxxx> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> --- TBD: Do we think we need to guard against the bizarre case of "target + count" overflowing (i.e. wrapping)? TBD: Accuracy could be slightly further improved by using a (to be introduced) rounding variant of muldiv64(). TBD: I'm not entirely sure how useful the conditionals are - there shouldn't be any inaccuracies from the division when count equals target (upon entry to the conditionals), as then the divisor is what the original value was just multiplied by. --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -378,8 +378,9 @@ static u64 read_hpet_count(void) static int64_t __init init_hpet(struct platform_timesource *pts) { - uint64_t hpet_rate, start; + uint64_t hpet_rate, start, expired; uint32_t count, target; +unsigned int i;//temp if ( hpet_address && strcmp(opt_clocksource, pts->id) && cpuidle_using_deep_cstate() ) @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p pts->frequency = hpet_rate; +for(i = 0; i < 16; ++i) {//temp count = hpet_read32(HPET_COUNTER); start = rdtsc_ordered(); target = count + CALIBRATE_VALUE(hpet_rate); if ( target < count ) while ( hpet_read32(HPET_COUNTER) >= count ) continue; - while ( hpet_read32(HPET_COUNTER) < target ) + while ( (count = hpet_read32(HPET_COUNTER)) < target ) continue; - return (rdtsc_ordered() - start) * CALIBRATE_FRAC; + expired = rdtsc_ordered() - start; + + if ( likely(count > target) ) + { + /* + * A (perhaps significant) delay before the last HPET read above (e.g. + * due to a SMI or NMI) can lead to (perhaps severe) inaccuracy if not + * accounting for the time expired past the originally calculated end + * of the calibration period. + */ +printk("%lu -> ", expired * CALIBRATE_FRAC);//temp + count -= target; + target = CALIBRATE_VALUE(hpet_rate); + expired = muldiv64(expired, target, target + count); +printk("%lu (%3u,%u)\n", expired * CALIBRATE_FRAC, count, target);//temp + } +} + + return expired * CALIBRATE_FRAC; } static void resume_hpet(struct platform_timesource *pts)

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.