Xen project Mailing List

Re: [PATCH] timer: fix NR_CPUS=1 build with gcc13

To: George Dunlap <george.dunlap@xxxxxxxxx>

Date: Thu, 14 Sep 2023 14:36:28 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oU1ps1hPglP7vpfqvDsdIJKDwDtcpHZIcV642C/FdE0=; b=d/SHBzS/1DBRQenrZWlw4O2ALkEzqs1cqemFkVc/4bFXNp0/q0kokgX1URocD5iBQQTpeauASEumuQ/hhyU6yGPuSIljWTkIYN4FvMoBmAcgIxkIPtFOwwiUsbliVnJJzNRP9lbiBKNN9CSG8N4KEznnPN4vzS9jWh/MKIRj/QAU8gKgGRVYbCViCEyO0nJvHD4xluNZYO/BB7b5EA7YOaWpaGJZZpoY6Ae2k/7hmTLpfgLbO3KUAwPZ8y4Yl6i+ENcy70PsRtw9c6AufoG14SvC3jPayFWDFTVi5QJp4O5i8vXecFGg40rIKAoqv6nGwBg+KJq8mWVdOS3NmB83lg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=myhDirO+XWtcHklXTIVD4s58q+rSpgWssvmbkqwQsoYdRpZr2saURe8/T714OHRYO+aazZng11I0WaYR+Jj3ZPZyCF4nDp2F0GO07reqeHAhwfffmu/14iR0DAbzssY4FFHCZUN3CBl1WKDckxHHuxtcJGwVYzoGPQ687ENayDPPGc6y2y3QwXIzzvAKOrSjqMP7BbgVsEmz7C8QQDuCxpp7OZmsMxCZQXwN5NiJ6q1AjX/JYmETrg8Wd0WhhNvk4aNu4fgJoifZsjKbmI7kgcPH++LMIJJl/DuwmTC96rA2jcdgfSQpYx2Kqd9ap9VrtmJpFaSdBGRo1khGXyQeYw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Thu, 14 Sep 2023 12:36:46 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.09.2023 12:25, George Dunlap wrote: > On Wed, Sep 13, 2023 at 11:05 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: >> On 13.09.2023 11:44, George Dunlap wrote: >>> On Wed, Sep 13, 2023 at 8:32 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: >>>> >>>> Gcc13 apparently infers from "if ( old_cpu < new_cpu )" that "new_cpu" >>>> is >= 1, and then (on x86) complains about "per_cpu(timers, new_cpu)" >>>> exceeding __per_cpu_offset[]'s bounds (being an array of 1 in such a >>>> configuration). Make the code conditional upon there being at least 2 >>>> CPUs configured (otherwise there simply is nothing to migrate [to]). >>> >>> Hmm, without digging into it, migrate_timer() doesn't seem like very >>> robust code: It doesn't check to make sure that new_cpu is valid, nor >>> does it give the option of returning an error if anything fails. >> >> Question is - what do you expect the callers to do upon getting back >> failure? > > [snip] > >>> Would it make more sense to add `|| >>> (new_cpu > CONFIG_NR_CPUS)` to the early-return conditional at the >>> top of the first `for (; ; )` loop? >> >> But that would mean not doing what was requested without any indication >> to the caller. An out-of-range CPU passed in is generally very likely >> to result in a crash, I think. > > If it's only off by a little bit, there's a good chance it might just > corrupt some other data, causing a crash further down the line, where > it's not obvious what went wrong. In general I would agree. but __per_cpu_offset[] is quite special in the values it holds. The data immediately following it would therefore also need to have unusual values within relatively narrow a range for a crash to not occur right away. > Generally speaking, passing an > error up the stack, explicitly crashing, or explicitly doing nothing > with a warning to the console are all better options. I guess I'll go that route then, since ... >>> I guess if we don't expect it ever to be called, it might be better to >>> get rid of the code entirely; but maybe in that case we should add >>> something like the following? >>> >>> ``` >>> #else >>> WARN_ONCE("migrate_timer: Request to move to %u on a single-core >>> system!", new_cpu); >>> ASSERT_UNREACHABLE(); >>> #endif >>> ``` >> >> With the old_cpu == new_cpu case explicitly permitted (and that being >> the only legal case when NR_CPUS=1, which arguably is an aspect which >> makes gcc's diagnostic questionable), perhaps only >> >> #else >> old_cpu = ...; >> if ( old_cpu != TIMER_CPU_status_killed ) >> WARN_ON(new_cpu != old_cpu); >> #endif >> >> (I'm afraid we have no WARN_ON_ONCE() yet, nor WARN_ONCE())? > > I think I was looking for `printk_once`. > > If there's no reasonable way to fail more gracefully (or no real point > in making the effort to do so), what if we add the following to the > top of the function? Does that make gcc13 happy? > > ``` > if ( new_cpu >= CONFIG_NR_CPUS ) > { > printk_once(/* whatever */); > ASSERT_UNREACHABLE(); > return; > } > ``` ... this actually makes things worse (then the compiler complains about old_cpu uses as array index), ... > Or, if we feel like being passed an invalid cpu means the state is so > bad it would be better to just crash and have done with it: > > ``` > BUG_ON(new_cpu >= CONFIG_NR_CPUS); > ``` ... and this, while it helps when then also done for old_cpu, seems too hefty to me. Just to mention it, 'asm volatile ( "" : "+g" (new_cpu) );' placed at the right location also helps. That's effectively RELOC_HIDE(), which we use to work around a gcc11 issue in the same area - see gcc11_wrap(). Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.