Xen project Mailing List

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 29 May 2024 18:14:56 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 29 May 2024 16:15:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote: > On 29.05.2024 17:03, Roger Pau Monné wrote: > > On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote: > >> On 29.05.2024 11:01, Roger Pau Monne wrote: > >>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() does > >>> so from > >>> a cpu_hotplug_{begin,done}() region the function will still return > >>> success, > >>> because a CPU taking the rwlock in read mode after having taken it in > >>> write > >>> mode is allowed. Such behavior however defeats the purpose of > >>> get_cpu_maps(), > >>> as it should always return false when called with a CPU hot{,un}plug > >>> operation > >>> is in progress. > >> > >> I'm not sure I can agree with this. The CPU doing said operation ought to > >> be > >> aware of what it is itself doing. And all other CPUs will get back false > >> from > >> get_cpu_maps(). > > > > Well, the CPU is aware in the context of cpu_{up,down}(), but not in > > the interrupts that might be handled while that operation is in > > progress, see below for a concrete example. > > > >>> Otherwise the logic in send_IPI_mask() for example is wrong, > >>> as it could decide to use the shorthand even when a CPU operation is in > >>> progress. > >> > >> It's also not becoming clear what's wrong there: As long as a CPU isn't > >> offline enough to not be in cpu_online_map anymore, it may well need to > >> still > >> be the target of IPIs, and targeting it with a shorthand then is still > >> fine. > > > > The issue is in the online path: there's a window where the CPU is > > online (and the lapic active), but cpu_online_map hasn't been updated > > yet. A specific example would be time_calibration() being executed on > > the CPU that is running cpu_up(). That could result in a shorthand > > IPI being used, but the mask in r.cpu_calibration_map not containing > > the CPU that's being brought up online because it's not yet added to > > cpu_online_map. Then the number of CPUs actually running > > time_calibration_rendezvous_fn won't match the weight of the cpumask > > in r.cpu_calibration_map. > > I see, but maybe only partly. Prior to the CPU having its bit set in > cpu_online_map, can it really take interrupts already? Shouldn't it be > running with IRQs off until later, thus preventing it from making it > into the rendezvous function in the first place? But yes, I can see > how the IRQ (IPI) then being delivered later (once IRQs are enabled) > might cause problems, too. The interrupt will get set in IRR and handled when interrupts are enabled. > > Plus, with how the rendezvous function is invoked (via > on_selected_cpus() with the mask copied from cpu_online_map), the > first check in smp_call_function_interrupt() ought to prevent the > function from being called on the CPU being onlined. A problem would > arise though if the IPI arrived later and call_data was already > (partly or fully) overwritten with the next request. Yeah, there's a small window where the fields in call_data are out of sync. > >> In any event this would again affect only the CPU leading the CPU > >> operation, > >> which should clearly know at which point(s) it is okay to send IPIs. Are we > >> actually sending any IPIs from within CPU-online or CPU-offline paths? > > > > Yes, I've seen the time rendezvous happening while in the middle of a > > hotplug operation, and the CPU coordinating the rendezvous being the > > one doing the CPU hotplug operation, so get_cpu_maps() returning true. > > Right, yet together with ... > > >> Together with the earlier paragraph the critical window would be between > >> the > >> CPU being taken off of cpu_online_map and the CPU actually going "dead" > >> (i.e. > >> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And even > >> then the question would be what bad, if any, would happen to that CPU if an > >> IPI was still targeted at it by way of using the shorthand. I'm pretty sure > >> it runs with IRQs off at that time, so no ordinary IRQ could be delivered. > >> > >>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock is > >>> already hold in write mode by the current CPU, as read_trylock() would > >>> otherwise return true. > >>> > >>> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already > >>> locked in write mode') > >> > >> I'm puzzled by this as well: Prior to that and the change referenced by its > >> Fixes: tag, recursive spin locks were used. For the purposes here that's > >> the > >> same as permitting read locking even when the write lock is already held by > >> the local CPU. > > > > I see, so the Fixes should be: > > > > x86/smp: use APIC ALLBUT destination shorthand when possible > > > > Instead, which is the commit that started using get_cpu_maps() in > > send_IPI_mask(). > > ... this I then wonder whether it's really only the condition in > send_IPI_mask() which needs further amending, rather than fiddling with > get_cpu_maps(). That the other option, but I have impression it's more fragile to adjust the condition in send_IPI_mask() rather than fiddle with get_cpu_maps(). However if that's the preference I can adjust. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.