[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 31 May 2024 10:33:58 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 31 May 2024 08:34:17 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 31.05.2024 09:31, Roger Pau Monné wrote:
> On Fri, May 31, 2024 at 09:02:20AM +0200, Jan Beulich wrote:
>> On 29.05.2024 18:14, Roger Pau Monné wrote:
>>> On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
>>>> On 29.05.2024 17:03, Roger Pau Monné wrote:
>>>>> On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
>>>>>> On 29.05.2024 11:01, Roger Pau Monne wrote:
>>>>>>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() does 
>>>>>>> so from
>>>>>>> a cpu_hotplug_{begin,done}() region the function will still return 
>>>>>>> success,
>>>>>>> because a CPU taking the rwlock in read mode after having taken it in 
>>>>>>> write
>>>>>>> mode is allowed.  Such behavior however defeats the purpose of 
>>>>>>> get_cpu_maps(),
>>>>>>> as it should always return false when called with a CPU hot{,un}plug 
>>>>>>> operation
>>>>>>> is in progress.
>>>>>>
>>>>>> I'm not sure I can agree with this. The CPU doing said operation ought 
>>>>>> to be
>>>>>> aware of what it is itself doing. And all other CPUs will get back false 
>>>>>> from
>>>>>> get_cpu_maps().
>>>>>
>>>>> Well, the CPU is aware in the context of cpu_{up,down}(), but not in
>>>>> the interrupts that might be handled while that operation is in
>>>>> progress, see below for a concrete example.
>>>>>
>>>>>>>  Otherwise the logic in send_IPI_mask() for example is wrong,
>>>>>>> as it could decide to use the shorthand even when a CPU operation is in
>>>>>>> progress.
>>>>>>
>>>>>> It's also not becoming clear what's wrong there: As long as a CPU isn't
>>>>>> offline enough to not be in cpu_online_map anymore, it may well need to 
>>>>>> still
>>>>>> be the target of IPIs, and targeting it with a shorthand then is still 
>>>>>> fine.
>>>>>
>>>>> The issue is in the online path: there's a window where the CPU is
>>>>> online (and the lapic active), but cpu_online_map hasn't been updated
>>>>> yet.  A specific example would be time_calibration() being executed on
>>>>> the CPU that is running cpu_up().  That could result in a shorthand
>>>>> IPI being used, but the mask in r.cpu_calibration_map not containing
>>>>> the CPU that's being brought up online because it's not yet added to
>>>>> cpu_online_map.  Then the number of CPUs actually running
>>>>> time_calibration_rendezvous_fn won't match the weight of the cpumask
>>>>> in r.cpu_calibration_map.
>>>>
>>>> I see, but maybe only partly. Prior to the CPU having its bit set in
>>>> cpu_online_map, can it really take interrupts already? Shouldn't it be
>>>> running with IRQs off until later, thus preventing it from making it
>>>> into the rendezvous function in the first place? But yes, I can see
>>>> how the IRQ (IPI) then being delivered later (once IRQs are enabled)
>>>> might cause problems, too.
>>>
>>> The interrupt will get set in IRR and handled when interrupts are
>>> enabled.
>>>
>>>>
>>>> Plus, with how the rendezvous function is invoked (via
>>>> on_selected_cpus() with the mask copied from cpu_online_map), the
>>>> first check in smp_call_function_interrupt() ought to prevent the
>>>> function from being called on the CPU being onlined. A problem would
>>>> arise though if the IPI arrived later and call_data was already
>>>> (partly or fully) overwritten with the next request.
>>>
>>> Yeah, there's a small window where the fields in call_data are out of
>>> sync.
>>>
>>>>>> In any event this would again affect only the CPU leading the CPU 
>>>>>> operation,
>>>>>> which should clearly know at which point(s) it is okay to send IPIs. Are 
>>>>>> we
>>>>>> actually sending any IPIs from within CPU-online or CPU-offline paths?
>>>>>
>>>>> Yes, I've seen the time rendezvous happening while in the middle of a
>>>>> hotplug operation, and the CPU coordinating the rendezvous being the
>>>>> one doing the CPU hotplug operation, so get_cpu_maps() returning true.
>>>>
>>>> Right, yet together with ...
>>>>
>>>>>> Together with the earlier paragraph the critical window would be between 
>>>>>> the
>>>>>> CPU being taken off of cpu_online_map and the CPU actually going "dead" 
>>>>>> (i.e.
>>>>>> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And 
>>>>>> even
>>>>>> then the question would be what bad, if any, would happen to that CPU if 
>>>>>> an
>>>>>> IPI was still targeted at it by way of using the shorthand. I'm pretty 
>>>>>> sure
>>>>>> it runs with IRQs off at that time, so no ordinary IRQ could be 
>>>>>> delivered.
>>>>>>
>>>>>>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock is
>>>>>>> already hold in write mode by the current CPU, as read_trylock() would
>>>>>>> otherwise return true.
>>>>>>>
>>>>>>> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already 
>>>>>>> locked in write mode')
>>>>>>
>>>>>> I'm puzzled by this as well: Prior to that and the change referenced by 
>>>>>> its
>>>>>> Fixes: tag, recursive spin locks were used. For the purposes here that's 
>>>>>> the
>>>>>> same as permitting read locking even when the write lock is already held 
>>>>>> by
>>>>>> the local CPU.
>>>>>
>>>>> I see, so the Fixes should be:
>>>>>
>>>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>>
>>>>> Instead, which is the commit that started using get_cpu_maps() in
>>>>> send_IPI_mask().
>>>>
>>>> ... this I then wonder whether it's really only the condition in
>>>> send_IPI_mask() which needs further amending, rather than fiddling with
>>>> get_cpu_maps().
>>>
>>> That the other option, but I have impression it's more fragile to
>>> adjust the condition in send_IPI_mask() rather than fiddle with
>>> get_cpu_maps().
>>>
>>> However if that's the preference I can adjust.
>>
>> I guess we need other REST input here then. The two of us clearly disagree on
>> what use of get_cpu_maps() is meant to guarantee. And I deem fiddling with
>> common code here more risky (and more intrusive - the other change would be
>> a single-line code change afaict, plus extending the related comment).
> 
> How do you envision that other change to be done?  Adding an extra
> variable and toggling it in cpu_hotplug_{begin,done}() to signal
> whether a CPU hotplug is in progress?

I was thinking of an is-write-locked-by-me check on cpu_add_remove_lock.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.