|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway
On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
> On 29.05.2024 17:03, Roger Pau Monné wrote:
> > On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
> >> On 29.05.2024 11:01, Roger Pau Monne wrote:
> >>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
> >>> so from
> >>> a cpu_hotplug_{begin,done}() region the function will still return
> >>> success,
> >>> because a CPU taking the rwlock in read mode after having taken it in
> >>> write
> >>> mode is allowed. Such behavior however defeats the purpose of
> >>> get_cpu_maps(),
> >>> as it should always return false when called with a CPU hot{,un}plug
> >>> operation
> >>> is in progress.
> >>
> >> I'm not sure I can agree with this. The CPU doing said operation ought to
> >> be
> >> aware of what it is itself doing. And all other CPUs will get back false
> >> from
> >> get_cpu_maps().
> >
> > Well, the CPU is aware in the context of cpu_{up,down}(), but not in
> > the interrupts that might be handled while that operation is in
> > progress, see below for a concrete example.
> >
> >>> Otherwise the logic in send_IPI_mask() for example is wrong,
> >>> as it could decide to use the shorthand even when a CPU operation is in
> >>> progress.
> >>
> >> It's also not becoming clear what's wrong there: As long as a CPU isn't
> >> offline enough to not be in cpu_online_map anymore, it may well need to
> >> still
> >> be the target of IPIs, and targeting it with a shorthand then is still
> >> fine.
> >
> > The issue is in the online path: there's a window where the CPU is
> > online (and the lapic active), but cpu_online_map hasn't been updated
> > yet. A specific example would be time_calibration() being executed on
> > the CPU that is running cpu_up(). That could result in a shorthand
> > IPI being used, but the mask in r.cpu_calibration_map not containing
> > the CPU that's being brought up online because it's not yet added to
> > cpu_online_map. Then the number of CPUs actually running
> > time_calibration_rendezvous_fn won't match the weight of the cpumask
> > in r.cpu_calibration_map.
>
> I see, but maybe only partly. Prior to the CPU having its bit set in
> cpu_online_map, can it really take interrupts already? Shouldn't it be
> running with IRQs off until later, thus preventing it from making it
> into the rendezvous function in the first place? But yes, I can see
> how the IRQ (IPI) then being delivered later (once IRQs are enabled)
> might cause problems, too.
The interrupt will get set in IRR and handled when interrupts are
enabled.
>
> Plus, with how the rendezvous function is invoked (via
> on_selected_cpus() with the mask copied from cpu_online_map), the
> first check in smp_call_function_interrupt() ought to prevent the
> function from being called on the CPU being onlined. A problem would
> arise though if the IPI arrived later and call_data was already
> (partly or fully) overwritten with the next request.
Yeah, there's a small window where the fields in call_data are out of
sync.
> >> In any event this would again affect only the CPU leading the CPU
> >> operation,
> >> which should clearly know at which point(s) it is okay to send IPIs. Are we
> >> actually sending any IPIs from within CPU-online or CPU-offline paths?
> >
> > Yes, I've seen the time rendezvous happening while in the middle of a
> > hotplug operation, and the CPU coordinating the rendezvous being the
> > one doing the CPU hotplug operation, so get_cpu_maps() returning true.
>
> Right, yet together with ...
>
> >> Together with the earlier paragraph the critical window would be between
> >> the
> >> CPU being taken off of cpu_online_map and the CPU actually going "dead"
> >> (i.e.
> >> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And even
> >> then the question would be what bad, if any, would happen to that CPU if an
> >> IPI was still targeted at it by way of using the shorthand. I'm pretty sure
> >> it runs with IRQs off at that time, so no ordinary IRQ could be delivered.
> >>
> >>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock is
> >>> already hold in write mode by the current CPU, as read_trylock() would
> >>> otherwise return true.
> >>>
> >>> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already
> >>> locked in write mode')
> >>
> >> I'm puzzled by this as well: Prior to that and the change referenced by its
> >> Fixes: tag, recursive spin locks were used. For the purposes here that's
> >> the
> >> same as permitting read locking even when the write lock is already held by
> >> the local CPU.
> >
> > I see, so the Fixes should be:
> >
> > x86/smp: use APIC ALLBUT destination shorthand when possible
> >
> > Instead, which is the commit that started using get_cpu_maps() in
> > send_IPI_mask().
>
> ... this I then wonder whether it's really only the condition in
> send_IPI_mask() which needs further amending, rather than fiddling with
> get_cpu_maps().
That the other option, but I have impression it's more fragile to
adjust the condition in send_IPI_mask() rather than fiddle with
get_cpu_maps().
However if that's the preference I can adjust.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |