|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] timer: fix NR_CPUS=1 build with gcc13
On 13.09.2023 12:25, George Dunlap wrote:
> On Wed, Sep 13, 2023 at 11:05 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 13.09.2023 11:44, George Dunlap wrote:
>>> On Wed, Sep 13, 2023 at 8:32 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>
>>>> Gcc13 apparently infers from "if ( old_cpu < new_cpu )" that "new_cpu"
>>>> is >= 1, and then (on x86) complains about "per_cpu(timers, new_cpu)"
>>>> exceeding __per_cpu_offset[]'s bounds (being an array of 1 in such a
>>>> configuration). Make the code conditional upon there being at least 2
>>>> CPUs configured (otherwise there simply is nothing to migrate [to]).
>>>
>>> Hmm, without digging into it, migrate_timer() doesn't seem like very
>>> robust code: It doesn't check to make sure that new_cpu is valid, nor
>>> does it give the option of returning an error if anything fails.
>>
>> Question is - what do you expect the callers to do upon getting back
>> failure?
>
> [snip]
>
>>> Would it make more sense to add `||
>>> (new_cpu > CONFIG_NR_CPUS)` to the early-return conditional at the
>>> top of the first `for (; ; )` loop?
>>
>> But that would mean not doing what was requested without any indication
>> to the caller. An out-of-range CPU passed in is generally very likely
>> to result in a crash, I think.
>
> If it's only off by a little bit, there's a good chance it might just
> corrupt some other data, causing a crash further down the line, where
> it's not obvious what went wrong.
In general I would agree. but __per_cpu_offset[] is quite special in
the values it holds. The data immediately following it would therefore
also need to have unusual values within relatively narrow a range for
a crash to not occur right away.
> Generally speaking, passing an
> error up the stack, explicitly crashing, or explicitly doing nothing
> with a warning to the console are all better options.
I guess I'll go that route then, since ...
>>> I guess if we don't expect it ever to be called, it might be better to
>>> get rid of the code entirely; but maybe in that case we should add
>>> something like the following?
>>>
>>> ```
>>> #else
>>> WARN_ONCE("migrate_timer: Request to move to %u on a single-core
>>> system!", new_cpu);
>>> ASSERT_UNREACHABLE();
>>> #endif
>>> ```
>>
>> With the old_cpu == new_cpu case explicitly permitted (and that being
>> the only legal case when NR_CPUS=1, which arguably is an aspect which
>> makes gcc's diagnostic questionable), perhaps only
>>
>> #else
>> old_cpu = ...;
>> if ( old_cpu != TIMER_CPU_status_killed )
>> WARN_ON(new_cpu != old_cpu);
>> #endif
>>
>> (I'm afraid we have no WARN_ON_ONCE() yet, nor WARN_ONCE())?
>
> I think I was looking for `printk_once`.
>
> If there's no reasonable way to fail more gracefully (or no real point
> in making the effort to do so), what if we add the following to the
> top of the function? Does that make gcc13 happy?
>
> ```
> if ( new_cpu >= CONFIG_NR_CPUS )
> {
> printk_once(/* whatever */);
> ASSERT_UNREACHABLE();
> return;
> }
> ```
... this actually makes things worse (then the compiler complains about
old_cpu uses as array index), ...
> Or, if we feel like being passed an invalid cpu means the state is so
> bad it would be better to just crash and have done with it:
>
> ```
> BUG_ON(new_cpu >= CONFIG_NR_CPUS);
> ```
... and this, while it helps when then also done for old_cpu, seems too
hefty to me.
Just to mention it, 'asm volatile ( "" : "+g" (new_cpu) );' placed at
the right location also helps. That's effectively RELOC_HIDE(), which
we use to work around a gcc11 issue in the same area - see gcc11_wrap().
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |