[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway



On Fri, May 31, 2024 at 10:33:58AM +0200, Jan Beulich wrote:
> On 31.05.2024 09:31, Roger Pau Monné wrote:
> > On Fri, May 31, 2024 at 09:02:20AM +0200, Jan Beulich wrote:
> >> On 29.05.2024 18:14, Roger Pau Monné wrote:
> >>> On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
> >>>> On 29.05.2024 17:03, Roger Pau Monné wrote:
> >>>>> On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
> >>>>>> On 29.05.2024 11:01, Roger Pau Monne wrote:
> >>>>>>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() 
> >>>>>>> does so from
> >>>>>>> a cpu_hotplug_{begin,done}() region the function will still return 
> >>>>>>> success,
> >>>>>>> because a CPU taking the rwlock in read mode after having taken it in 
> >>>>>>> write
> >>>>>>> mode is allowed.  Such behavior however defeats the purpose of 
> >>>>>>> get_cpu_maps(),
> >>>>>>> as it should always return false when called with a CPU hot{,un}plug 
> >>>>>>> operation
> >>>>>>> is in progress.
> >>>>>>
> >>>>>> I'm not sure I can agree with this. The CPU doing said operation ought 
> >>>>>> to be
> >>>>>> aware of what it is itself doing. And all other CPUs will get back 
> >>>>>> false from
> >>>>>> get_cpu_maps().
> >>>>>
> >>>>> Well, the CPU is aware in the context of cpu_{up,down}(), but not in
> >>>>> the interrupts that might be handled while that operation is in
> >>>>> progress, see below for a concrete example.
> >>>>>
> >>>>>>>  Otherwise the logic in send_IPI_mask() for example is wrong,
> >>>>>>> as it could decide to use the shorthand even when a CPU operation is 
> >>>>>>> in
> >>>>>>> progress.
> >>>>>>
> >>>>>> It's also not becoming clear what's wrong there: As long as a CPU isn't
> >>>>>> offline enough to not be in cpu_online_map anymore, it may well need 
> >>>>>> to still
> >>>>>> be the target of IPIs, and targeting it with a shorthand then is still 
> >>>>>> fine.
> >>>>>
> >>>>> The issue is in the online path: there's a window where the CPU is
> >>>>> online (and the lapic active), but cpu_online_map hasn't been updated
> >>>>> yet.  A specific example would be time_calibration() being executed on
> >>>>> the CPU that is running cpu_up().  That could result in a shorthand
> >>>>> IPI being used, but the mask in r.cpu_calibration_map not containing
> >>>>> the CPU that's being brought up online because it's not yet added to
> >>>>> cpu_online_map.  Then the number of CPUs actually running
> >>>>> time_calibration_rendezvous_fn won't match the weight of the cpumask
> >>>>> in r.cpu_calibration_map.
> >>>>
> >>>> I see, but maybe only partly. Prior to the CPU having its bit set in
> >>>> cpu_online_map, can it really take interrupts already? Shouldn't it be
> >>>> running with IRQs off until later, thus preventing it from making it
> >>>> into the rendezvous function in the first place? But yes, I can see
> >>>> how the IRQ (IPI) then being delivered later (once IRQs are enabled)
> >>>> might cause problems, too.
> >>>
> >>> The interrupt will get set in IRR and handled when interrupts are
> >>> enabled.
> >>>
> >>>>
> >>>> Plus, with how the rendezvous function is invoked (via
> >>>> on_selected_cpus() with the mask copied from cpu_online_map), the
> >>>> first check in smp_call_function_interrupt() ought to prevent the
> >>>> function from being called on the CPU being onlined. A problem would
> >>>> arise though if the IPI arrived later and call_data was already
> >>>> (partly or fully) overwritten with the next request.
> >>>
> >>> Yeah, there's a small window where the fields in call_data are out of
> >>> sync.
> >>>
> >>>>>> In any event this would again affect only the CPU leading the CPU 
> >>>>>> operation,
> >>>>>> which should clearly know at which point(s) it is okay to send IPIs. 
> >>>>>> Are we
> >>>>>> actually sending any IPIs from within CPU-online or CPU-offline paths?
> >>>>>
> >>>>> Yes, I've seen the time rendezvous happening while in the middle of a
> >>>>> hotplug operation, and the CPU coordinating the rendezvous being the
> >>>>> one doing the CPU hotplug operation, so get_cpu_maps() returning true.
> >>>>
> >>>> Right, yet together with ...
> >>>>
> >>>>>> Together with the earlier paragraph the critical window would be 
> >>>>>> between the
> >>>>>> CPU being taken off of cpu_online_map and the CPU actually going 
> >>>>>> "dead" (i.e.
> >>>>>> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And 
> >>>>>> even
> >>>>>> then the question would be what bad, if any, would happen to that CPU 
> >>>>>> if an
> >>>>>> IPI was still targeted at it by way of using the shorthand. I'm pretty 
> >>>>>> sure
> >>>>>> it runs with IRQs off at that time, so no ordinary IRQ could be 
> >>>>>> delivered.
> >>>>>>
> >>>>>>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock 
> >>>>>>> is
> >>>>>>> already hold in write mode by the current CPU, as read_trylock() would
> >>>>>>> otherwise return true.
> >>>>>>>
> >>>>>>> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when 
> >>>>>>> already locked in write mode')
> >>>>>>
> >>>>>> I'm puzzled by this as well: Prior to that and the change referenced 
> >>>>>> by its
> >>>>>> Fixes: tag, recursive spin locks were used. For the purposes here 
> >>>>>> that's the
> >>>>>> same as permitting read locking even when the write lock is already 
> >>>>>> held by
> >>>>>> the local CPU.
> >>>>>
> >>>>> I see, so the Fixes should be:
> >>>>>
> >>>>> x86/smp: use APIC ALLBUT destination shorthand when possible
> >>>>>
> >>>>> Instead, which is the commit that started using get_cpu_maps() in
> >>>>> send_IPI_mask().
> >>>>
> >>>> ... this I then wonder whether it's really only the condition in
> >>>> send_IPI_mask() which needs further amending, rather than fiddling with
> >>>> get_cpu_maps().
> >>>
> >>> That the other option, but I have impression it's more fragile to
> >>> adjust the condition in send_IPI_mask() rather than fiddle with
> >>> get_cpu_maps().
> >>>
> >>> However if that's the preference I can adjust.
> >>
> >> I guess we need other REST input here then. The two of us clearly disagree 
> >> on
> >> what use of get_cpu_maps() is meant to guarantee. And I deem fiddling with
> >> common code here more risky (and more intrusive - the other change would be
> >> a single-line code change afaict, plus extending the related comment).
> > 
> > How do you envision that other change to be done?  Adding an extra
> > variable and toggling it in cpu_hotplug_{begin,done}() to signal
> > whether a CPU hotplug is in progress?
> 
> I was thinking of an is-write-locked-by-me check on cpu_add_remove_lock.

Oh, so basically open-coding what I proposed here as get_cpu_maps() in
send_IPI_mask().  Unless anyone else expresses interest in my current
proposal I would switch to that.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.