[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] timer: fix NR_CPUS=1 build with gcc13


  • To: George Dunlap <george.dunlap@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 14 Sep 2023 14:36:28 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oU1ps1hPglP7vpfqvDsdIJKDwDtcpHZIcV642C/FdE0=; b=d/SHBzS/1DBRQenrZWlw4O2ALkEzqs1cqemFkVc/4bFXNp0/q0kokgX1URocD5iBQQTpeauASEumuQ/hhyU6yGPuSIljWTkIYN4FvMoBmAcgIxkIPtFOwwiUsbliVnJJzNRP9lbiBKNN9CSG8N4KEznnPN4vzS9jWh/MKIRj/QAU8gKgGRVYbCViCEyO0nJvHD4xluNZYO/BB7b5EA7YOaWpaGJZZpoY6Ae2k/7hmTLpfgLbO3KUAwPZ8y4Yl6i+ENcy70PsRtw9c6AufoG14SvC3jPayFWDFTVi5QJp4O5i8vXecFGg40rIKAoqv6nGwBg+KJq8mWVdOS3NmB83lg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=myhDirO+XWtcHklXTIVD4s58q+rSpgWssvmbkqwQsoYdRpZr2saURe8/T714OHRYO+aazZng11I0WaYR+Jj3ZPZyCF4nDp2F0GO07reqeHAhwfffmu/14iR0DAbzssY4FFHCZUN3CBl1WKDckxHHuxtcJGwVYzoGPQ687ENayDPPGc6y2y3QwXIzzvAKOrSjqMP7BbgVsEmz7C8QQDuCxpp7OZmsMxCZQXwN5NiJ6q1AjX/JYmETrg8Wd0WhhNvk4aNu4fgJoifZsjKbmI7kgcPH++LMIJJl/DuwmTC96rA2jcdgfSQpYx2Kqd9ap9VrtmJpFaSdBGRo1khGXyQeYw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Thu, 14 Sep 2023 12:36:46 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.09.2023 12:25, George Dunlap wrote:
> On Wed, Sep 13, 2023 at 11:05 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 13.09.2023 11:44, George Dunlap wrote:
>>> On Wed, Sep 13, 2023 at 8:32 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>>>
>>>> Gcc13 apparently infers from "if ( old_cpu < new_cpu )" that "new_cpu"
>>>> is >= 1, and then (on x86) complains about "per_cpu(timers, new_cpu)"
>>>> exceeding __per_cpu_offset[]'s bounds (being an array of 1 in such a
>>>> configuration). Make the code conditional upon there being at least 2
>>>> CPUs configured (otherwise there simply is nothing to migrate [to]).
>>>
>>> Hmm, without digging into it, migrate_timer() doesn't seem like very
>>> robust code: It doesn't check to make sure that new_cpu is valid, nor
>>> does it give the option of returning an error if anything fails.
>>
>> Question is - what do you expect the callers to do upon getting back
>> failure?
> 
> [snip]
> 
>>>  Would it make more sense to add `||
>>> (new_cpu > CONFIG_NR_CPUS)` to the early-return  conditional at the
>>> top of the first `for (; ; )` loop?
>>
>> But that would mean not doing what was requested without any indication
>> to the caller. An out-of-range CPU passed in is generally very likely
>> to result in a crash, I think.
> 
> If it's only off by a little bit, there's a good chance it might just
> corrupt some other data, causing a crash further down the line, where
> it's not obvious what went wrong.

In general I would agree. but __per_cpu_offset[] is quite special in
the values it holds. The data immediately following it would therefore
also need to have unusual values within relatively narrow a range for
a crash to not occur right away.

>  Generally speaking, passing an
> error up the stack, explicitly crashing, or explicitly doing nothing
> with a warning to the console are all better options.

I guess I'll go that route then, since ...

>>> I guess if we don't expect it ever to be called, it might be better to
>>> get rid of the code entirely; but maybe in that case we should add
>>> something like the following?
>>>
>>> ```
>>> #else
>>>     WARN_ONCE("migrate_timer: Request to move to %u on a single-core
>>> system!", new_cpu);
>>>     ASSERT_UNREACHABLE();
>>> #endif
>>> ```
>>
>> With the old_cpu == new_cpu case explicitly permitted (and that being
>> the only legal case when NR_CPUS=1, which arguably is an aspect which
>> makes gcc's diagnostic questionable), perhaps only
>>
>> #else
>>     old_cpu = ...;
>>     if ( old_cpu != TIMER_CPU_status_killed )
>>         WARN_ON(new_cpu != old_cpu);
>> #endif
>>
>> (I'm afraid we have no WARN_ON_ONCE() yet, nor WARN_ONCE())?
> 
> I think I was looking for `printk_once`.
> 
> If there's no reasonable way to fail more gracefully (or no real point
> in making the effort to do so), what if we add the following to the
> top of the function?  Does that make gcc13 happy?
> 
> ```
> if ( new_cpu >= CONFIG_NR_CPUS )
> {
>     printk_once(/* whatever */);
>     ASSERT_UNREACHABLE();
>     return;
> }
> ```

... this actually makes things worse (then the compiler complains about
old_cpu uses as array index), ...

> Or, if we feel like being passed an invalid cpu means the state is so
> bad it would be better to just crash and have done with it:
> 
> ```
>   BUG_ON(new_cpu >= CONFIG_NR_CPUS);
> ```

... and this, while it helps when then also done for old_cpu, seems too
hefty to me.

Just to mention it, 'asm volatile ( "" : "+g" (new_cpu) );' placed at
the right location also helps. That's effectively RELOC_HIDE(), which
we use to work around a gcc11 issue in the same area - see gcc11_wrap().

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.