[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/5] x86/cpuid: Fix handling of xsave dynamic leaves


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 6 May 2021 08:17:21 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pniEuLjovCExY280Ez4B6hLD5bMfEtcXyVRIojq+2UU=; b=ThFLws4F+oikzlgGrzg9lj776TOmwLoh62c1tVF/RsqAs+of1QtrXhpoH4kGVbg/f3eaY3iCClLv2MnnGftS4PvM2nIisH3spUlE3S9OotfbbibTCsj7++SU7263C+4GaXJ7sgEL13qPFmYZxh75/x02ET7SjQCTBVZ0pxzngkPoM3/wMKFdGOjN2wxzoza9GET9tSLQU0STbBVRBpPlQ+bSzkN0lzBFylnLYdjsBlb6dsejzqaQaOHj2blwctMVlQv7pVebCOOyflby6+ca3H+gtRbAQfmHGVDlOFMSNG8V7yvSRBZSHSOdoBY5N/sJOf2xs5r0faM64O2EEUM3Mg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PE9g0dZr2Xie90mSXZGORy8yhbmdRHBv70p3uIxR7ODH79LzdeO6fd2HtwnSyAK/BP6kH7Vr4b+U6trgBx1rE0n5sW02J/bDnv4gENTOQR4iVG7IOCCJHzOZuGLxmtkJ07BsCHi5ElaUO0gXjqRe2dmR8AXjJQHbHqXhS/nz6Uz3qGUj1NRODQ9J142uKB14djyI7bnT5lD+PNwd/PRk1MRGwbR2w2yTfhzU3z4ETqyHMYuT0z65veKCXft50nWYkfHhn9t8ICAph1z3gc5qUsxwM2uxhUyeb13bw2ewf+YM0ulIzRYzU6b8gYLp2D5ee6wcUMU1zgCTyKnVxbhDQA==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 06 May 2021 06:17:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.05.2021 18:59, Andrew Cooper wrote:
> On 05/05/2021 09:33, Jan Beulich wrote:
>> On 04.05.2021 16:17, Andrew Cooper wrote:
>>> On 04/05/2021 13:56, Jan Beulich wrote:
>>>> On 03.05.2021 17:39, Andrew Cooper wrote:
>>>>> +unsigned int xstate_compressed_size(uint64_t xstates)
>>>>> +{
>>>>> +    unsigned int i, size = XSTATE_AREA_MIN_SIZE;
>>>>> +
>>>>> +    xstates &= ~XSTATE_FP_SSE;
>>>>> +    for_each_set_bit ( i, &xstates, 63 )
>>>>> +    {
>>>>> +        if ( test_bit(i, &xstate_align) )
>>>>> +            size = ROUNDUP(size, 64);
>>>>> +
>>>>> +        size += xstate_sizes[i];
>>>>> +    }
>>>>> +
>>>>> +    /* In debug builds, cross-check our calculation with hardware. */
>>>>> +    if ( IS_ENABLED(CONFIG_DEBUG) )
>>>>> +    {
>>>>> +        unsigned int hwsize;
>>>>> +
>>>>> +        xstates |= XSTATE_FP_SSE;
>>>>> +        hwsize = hw_compressed_size(xstates);
>>>>> +
>>>>> +        if ( size != hwsize )
>>>>> +            printk_once(XENLOG_ERR "%s(%#"PRIx64") size %#x != hwsize 
>>>>> %#x\n",
>>>>> +                        __func__, xstates, size, hwsize);
>>>>> +        size = hwsize;
>>>> To be honest, already on the earlier patch I was wondering whether
>>>> it does any good to override size here: That'll lead to different
>>>> behavior on debug vs release builds. If the log message is not
>>>> paid attention to, we'd then end up with longer term breakage.
>>> Well - our options are pass hardware size, or BUG(), because getting
>>> this wrong will cause memory corruption.
>> I'm afraid I'm lost: Neither passing hardware size nor BUG() would
>> happen in a release build, so getting this wrong does mean memory
>> corruption there. And I'm of the clear opinion that debug builds
>> shouldn't differ in behavior in such regards.
> 
> The point of not cross-checking with hardware in release builds is to
> remove a bunch of very expensive operations from path which is hit
> several times on every fork(), so isn't exactly rare.
> 
> But yes - the consequence of being wrong, for whatever reason, is memory
> corruption (especially as the obvious way it goes wrong is having an
> xsave_size[] of 0, so we end up under-reporting).
> 
> So one way or another, we need to ensure that every newly exposed option
> is tested in a debug build first.  The integration is either complete,
> or it isn't, so I don't think this terribly onerous to do.
> 
> 
> As for actually having a behaviour difference between debug and release,
> I'm not concerned.
> 
> Hitting this failure means "there is definitely a serious error in Xen,
> needing fixing", but it will only be encountered the during development
> of a new feature, so is for all practical concerns, limited to
> development of the new feature in question.
> 
> BUG() gets the point across very obviously, but is unhelpful when in
> practice the test system wont explode because the dom0 kernel won't be
> using this new state yet.
> 
>> If there wasn't an increasing number of possible combinations I
>> would be inclined to suggest that in all builds we check during
>> boot that calculation and hardware provided values match for all
>> possible (valid) combinations.
> 
> I was actually considering an XTF test on the matter, which would be a
> cleaning up of the one generating the example in the cover letter.
> 
> As above, logic is only liable to be wrong during development of support
> for a new state component, which is why it is reasonable to take away
> the overhead in release builds.

Well, okay then - let's hope all bugs here are obviously exposed
during initial development, and no corner cases get missed.

>>> The BUG() option is a total pain when developing new support - the first
>>> version of this patch did use BUG(), but after hitting it 4 times in a
>>> row (caused by issues with constants elsewhere), I decided against it.
>> I can fully understand this aspect. I'm not opposed to printk_once()
>> at all. My comment was purely towards the override.
>>
>>> If we had something which was a mix between WARN_ONCE() and a proper
>>> printk() explaining what was going on, then I would have used that. 
>>> Maybe it's time to introduce one...
>> I don't think there's a need here - what you have in terms of info
>> put into the log is imo sufficient.
> 
> Well - it needs to be sufficiently obvious to people who aren't you and
> me.  There are also other areas in Xen which would benefit from changing
> their diagnostics to be as described.

I generally disagree: A log message of this kind needs to be detailed
enough to easily find where it originates in source. Then the source
there should have enough information. Things are different when a log
message implies an admin may be able to take some corrective action
without actually changing source code in any way. That's not the case
here afaict.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.