Xen project Mailing List

Re: [PATCH 5/5] x86/cpuid: Fix handling of xsave dynamic leaves

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 6 May 2021 08:17:21 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pniEuLjovCExY280Ez4B6hLD5bMfEtcXyVRIojq+2UU=; b=ThFLws4F+oikzlgGrzg9lj776TOmwLoh62c1tVF/RsqAs+of1QtrXhpoH4kGVbg/f3eaY3iCClLv2MnnGftS4PvM2nIisH3spUlE3S9OotfbbibTCsj7++SU7263C+4GaXJ7sgEL13qPFmYZxh75/x02ET7SjQCTBVZ0pxzngkPoM3/wMKFdGOjN2wxzoza9GET9tSLQU0STbBVRBpPlQ+bSzkN0lzBFylnLYdjsBlb6dsejzqaQaOHj2blwctMVlQv7pVebCOOyflby6+ca3H+gtRbAQfmHGVDlOFMSNG8V7yvSRBZSHSOdoBY5N/sJOf2xs5r0faM64O2EEUM3Mg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PE9g0dZr2Xie90mSXZGORy8yhbmdRHBv70p3uIxR7ODH79LzdeO6fd2HtwnSyAK/BP6kH7Vr4b+U6trgBx1rE0n5sW02J/bDnv4gENTOQR4iVG7IOCCJHzOZuGLxmtkJ07BsCHi5ElaUO0gXjqRe2dmR8AXjJQHbHqXhS/nz6Uz3qGUj1NRODQ9J142uKB14djyI7bnT5lD+PNwd/PRk1MRGwbR2w2yTfhzU3z4ETqyHMYuT0z65veKCXft50nWYkfHhn9t8ICAph1z3gc5qUsxwM2uxhUyeb13bw2ewf+YM0ulIzRYzU6b8gYLp2D5ee6wcUMU1zgCTyKnVxbhDQA==

Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 06 May 2021 06:17:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.05.2021 18:59, Andrew Cooper wrote: > On 05/05/2021 09:33, Jan Beulich wrote: >> On 04.05.2021 16:17, Andrew Cooper wrote: >>> On 04/05/2021 13:56, Jan Beulich wrote: >>>> On 03.05.2021 17:39, Andrew Cooper wrote: >>>>> +unsigned int xstate_compressed_size(uint64_t xstates) >>>>> +{ >>>>> + unsigned int i, size = XSTATE_AREA_MIN_SIZE; >>>>> + >>>>> + xstates &= ~XSTATE_FP_SSE; >>>>> + for_each_set_bit ( i, &xstates, 63 ) >>>>> + { >>>>> + if ( test_bit(i, &xstate_align) ) >>>>> + size = ROUNDUP(size, 64); >>>>> + >>>>> + size += xstate_sizes[i]; >>>>> + } >>>>> + >>>>> + /* In debug builds, cross-check our calculation with hardware. */ >>>>> + if ( IS_ENABLED(CONFIG_DEBUG) ) >>>>> + { >>>>> + unsigned int hwsize; >>>>> + >>>>> + xstates |= XSTATE_FP_SSE; >>>>> + hwsize = hw_compressed_size(xstates); >>>>> + >>>>> + if ( size != hwsize ) >>>>> + printk_once(XENLOG_ERR "%s(%#"PRIx64") size %#x != hwsize >>>>> %#x\n", >>>>> + __func__, xstates, size, hwsize); >>>>> + size = hwsize; >>>> To be honest, already on the earlier patch I was wondering whether >>>> it does any good to override size here: That'll lead to different >>>> behavior on debug vs release builds. If the log message is not >>>> paid attention to, we'd then end up with longer term breakage. >>> Well - our options are pass hardware size, or BUG(), because getting >>> this wrong will cause memory corruption. >> I'm afraid I'm lost: Neither passing hardware size nor BUG() would >> happen in a release build, so getting this wrong does mean memory >> corruption there. And I'm of the clear opinion that debug builds >> shouldn't differ in behavior in such regards. > > The point of not cross-checking with hardware in release builds is to > remove a bunch of very expensive operations from path which is hit > several times on every fork(), so isn't exactly rare. > > But yes - the consequence of being wrong, for whatever reason, is memory > corruption (especially as the obvious way it goes wrong is having an > xsave_size[] of 0, so we end up under-reporting). > > So one way or another, we need to ensure that every newly exposed option > is tested in a debug build first. The integration is either complete, > or it isn't, so I don't think this terribly onerous to do. > > > As for actually having a behaviour difference between debug and release, > I'm not concerned. > > Hitting this failure means "there is definitely a serious error in Xen, > needing fixing", but it will only be encountered the during development > of a new feature, so is for all practical concerns, limited to > development of the new feature in question. > > BUG() gets the point across very obviously, but is unhelpful when in > practice the test system wont explode because the dom0 kernel won't be > using this new state yet. > >> If there wasn't an increasing number of possible combinations I >> would be inclined to suggest that in all builds we check during >> boot that calculation and hardware provided values match for all >> possible (valid) combinations. > > I was actually considering an XTF test on the matter, which would be a > cleaning up of the one generating the example in the cover letter. > > As above, logic is only liable to be wrong during development of support > for a new state component, which is why it is reasonable to take away > the overhead in release builds. Well, okay then - let's hope all bugs here are obviously exposed during initial development, and no corner cases get missed. >>> The BUG() option is a total pain when developing new support - the first >>> version of this patch did use BUG(), but after hitting it 4 times in a >>> row (caused by issues with constants elsewhere), I decided against it. >> I can fully understand this aspect. I'm not opposed to printk_once() >> at all. My comment was purely towards the override. >> >>> If we had something which was a mix between WARN_ONCE() and a proper >>> printk() explaining what was going on, then I would have used that. >>> Maybe it's time to introduce one... >> I don't think there's a need here - what you have in terms of info >> put into the log is imo sufficient. > > Well - it needs to be sufficiently obvious to people who aren't you and > me. There are also other areas in Xen which would benefit from changing > their diagnostics to be as described. I generally disagree: A log message of this kind needs to be detailed enough to easily find where it originates in source. Then the source there should have enough information. Things are different when a log message implies an admin may be able to take some corrective action without actually changing source code in any way. That's not the case here afaict. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.