|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 5/5] x86/cpuid: Fix handling of xsave dynamic leaves
On 05.05.2021 18:59, Andrew Cooper wrote:
> On 05/05/2021 09:33, Jan Beulich wrote:
>> On 04.05.2021 16:17, Andrew Cooper wrote:
>>> On 04/05/2021 13:56, Jan Beulich wrote:
>>>> On 03.05.2021 17:39, Andrew Cooper wrote:
>>>>> +unsigned int xstate_compressed_size(uint64_t xstates)
>>>>> +{
>>>>> + unsigned int i, size = XSTATE_AREA_MIN_SIZE;
>>>>> +
>>>>> + xstates &= ~XSTATE_FP_SSE;
>>>>> + for_each_set_bit ( i, &xstates, 63 )
>>>>> + {
>>>>> + if ( test_bit(i, &xstate_align) )
>>>>> + size = ROUNDUP(size, 64);
>>>>> +
>>>>> + size += xstate_sizes[i];
>>>>> + }
>>>>> +
>>>>> + /* In debug builds, cross-check our calculation with hardware. */
>>>>> + if ( IS_ENABLED(CONFIG_DEBUG) )
>>>>> + {
>>>>> + unsigned int hwsize;
>>>>> +
>>>>> + xstates |= XSTATE_FP_SSE;
>>>>> + hwsize = hw_compressed_size(xstates);
>>>>> +
>>>>> + if ( size != hwsize )
>>>>> + printk_once(XENLOG_ERR "%s(%#"PRIx64") size %#x != hwsize
>>>>> %#x\n",
>>>>> + __func__, xstates, size, hwsize);
>>>>> + size = hwsize;
>>>> To be honest, already on the earlier patch I was wondering whether
>>>> it does any good to override size here: That'll lead to different
>>>> behavior on debug vs release builds. If the log message is not
>>>> paid attention to, we'd then end up with longer term breakage.
>>> Well - our options are pass hardware size, or BUG(), because getting
>>> this wrong will cause memory corruption.
>> I'm afraid I'm lost: Neither passing hardware size nor BUG() would
>> happen in a release build, so getting this wrong does mean memory
>> corruption there. And I'm of the clear opinion that debug builds
>> shouldn't differ in behavior in such regards.
>
> The point of not cross-checking with hardware in release builds is to
> remove a bunch of very expensive operations from path which is hit
> several times on every fork(), so isn't exactly rare.
>
> But yes - the consequence of being wrong, for whatever reason, is memory
> corruption (especially as the obvious way it goes wrong is having an
> xsave_size[] of 0, so we end up under-reporting).
>
> So one way or another, we need to ensure that every newly exposed option
> is tested in a debug build first. The integration is either complete,
> or it isn't, so I don't think this terribly onerous to do.
>
>
> As for actually having a behaviour difference between debug and release,
> I'm not concerned.
>
> Hitting this failure means "there is definitely a serious error in Xen,
> needing fixing", but it will only be encountered the during development
> of a new feature, so is for all practical concerns, limited to
> development of the new feature in question.
>
> BUG() gets the point across very obviously, but is unhelpful when in
> practice the test system wont explode because the dom0 kernel won't be
> using this new state yet.
>
>> If there wasn't an increasing number of possible combinations I
>> would be inclined to suggest that in all builds we check during
>> boot that calculation and hardware provided values match for all
>> possible (valid) combinations.
>
> I was actually considering an XTF test on the matter, which would be a
> cleaning up of the one generating the example in the cover letter.
>
> As above, logic is only liable to be wrong during development of support
> for a new state component, which is why it is reasonable to take away
> the overhead in release builds.
Well, okay then - let's hope all bugs here are obviously exposed
during initial development, and no corner cases get missed.
>>> The BUG() option is a total pain when developing new support - the first
>>> version of this patch did use BUG(), but after hitting it 4 times in a
>>> row (caused by issues with constants elsewhere), I decided against it.
>> I can fully understand this aspect. I'm not opposed to printk_once()
>> at all. My comment was purely towards the override.
>>
>>> If we had something which was a mix between WARN_ONCE() and a proper
>>> printk() explaining what was going on, then I would have used that.
>>> Maybe it's time to introduce one...
>> I don't think there's a need here - what you have in terms of info
>> put into the log is imo sufficient.
>
> Well - it needs to be sufficiently obvious to people who aren't you and
> me. There are also other areas in Xen which would benefit from changing
> their diagnostics to be as described.
I generally disagree: A log message of this kind needs to be detailed
enough to easily find where it originates in source. Then the source
there should have enough information. Things are different when a log
message implies an admin may be able to take some corrective action
without actually changing source code in any way. That's not the case
here afaict.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |