Xen project Mailing List

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

To: Oleksii <oleksii.kurochko@xxxxxxxxx>

Date: Mon, 19 Feb 2024 16:29:30 +0100

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Alistair Francis <alistair.francis@xxxxxxx>, Bob Eshleman <bobbyeshleman@xxxxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Julien Grall <julien@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 19 Feb 2024 15:29:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19.02.2024 16:20, Oleksii wrote: > On Mon, 2024-02-19 at 15:12 +0100, Jan Beulich wrote: >> On 19.02.2024 15:00, Oleksii wrote: >>> On Sun, 2024-02-18 at 19:00 +0000, Julien Grall wrote: >>>> On 05/02/2024 15:32, Oleksii Kurochko wrote: >>>>> --- /dev/null >>>>> +++ b/xen/arch/riscv/include/asm/cmpxchg.h >>>>> @@ -0,0 +1,237 @@ >>>>> +/* SPDX-License-Identifier: GPL-2.0-only */ >>>>> +/* Copyright (C) 2014 Regents of the University of California >>>>> */ >>>>> + >>>>> +#ifndef _ASM_RISCV_CMPXCHG_H >>>>> +#define _ASM_RISCV_CMPXCHG_H >>>>> + >>>>> +#include <xen/compiler.h> >>>>> +#include <xen/lib.h> >>>>> + >>>>> +#include <asm/fence.h> >>>>> +#include <asm/io.h> >>>>> +#include <asm/system.h> >>>>> + >>>>> +#define ALIGN_DOWN(addr, size) ((addr) & (~((size) - 1))) >>>>> + >>>>> +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier, >>>>> acquire_barrier) \ >>>>> +({ \ >>>>> + asm volatile( \ >>>>> + release_barrier \ >>>>> + " amoswap" sfx " %0, %2, %1\n" \ >>>>> + acquire_barrier \ >>>>> + : "=r" (ret), "+A" (*ptr) \ >>>>> + : "r" (new) \ >>>>> + : "memory" ); \ >>>>> +}) >>>>> + >>>>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, >>>>> acquire_barrier) \ >>>>> +({ \ >>>>> + uint32_t *ptr_32b_aligned = (uint32_t >>>>> *)ALIGN_DOWN((unsigned >>>>> long)ptr, 4); \ >>>>> + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - >>>>> sizeof(*ptr))) >>>>> * BITS_PER_BYTE; \ >>>>> + uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \ >>>>> + uint8_t mask_h = mask_l + mask_size - 1; \ >>>>> + unsigned long mask = GENMASK(mask_h, mask_l); \ >>>>> + unsigned long new_ = (unsigned long)(new) << mask_l; \ >>>>> + unsigned long ret_; \ >>>>> + unsigned long rc; \ >>>>> + \ >>>>> + asm volatile( \ >>>>> + release_barrier \ >>>>> + "0: lr.d %0, %2\n" \ >>>> >>>> I was going to ask why this is lr.d rather than lr.w. But I see >>>> Jan >>>> already asked. I agree with him that it should probably be a lr.w >>>> and >>>> ... >>>> >>>>> + " and %1, %0, %z4\n" \ >>>>> + " or %1, %1, %z3\n" \ >>>>> + " sc.d %1, %1, %2\n" \ >>>> >>>> ... respectively sc.w. The same applies for cmpxchg. >>> >>> I agree that it would be better, and my initial attempt was to >>> handle >>> 4-byte or 8-byte boundary crossing during 2-byte access: >>> >>> 0 1 2 3 4 5 6 7 8 >>> X X X 1 1 X X X X >>> >>> In this case, if I align address 3 to address 0 and then read 4 >>> bytes >>> instead of 8 bytes, I will not process the byte at address 4. This >>> was >>> the reason why I started to read 8 bytes. >>> >>> I also acknowledge that there could be an issue in the following >>> case: >>> >>> X 4094 4095 4096 >>> X 1 1 X >>> In this situation, when I read 8 bytes, there could be an issue >>> where >>> the new page (which starts at 4096) will not be mapped. It seems >>> correct in this case to check that variable is within one page and >>> read >>> 4 bytes instead of 8. >>> >>> One more thing I am uncertain about is if we change everything to >>> read >>> 4 bytes with 4-byte alignment, what should be done with the first >>> case? >>> Should we panic? (I am not sure if this is an option.) >> >> Counter question (raised elsewhere already): What if a 4-byte access >> crosses a word / cache line / page boundary? Ideally exactly the >> same would happen for a 2-byte access crossing a respective boundary. >> (Which you can achieve relatively easily by masking off only address >> bit 1, keeping address bit 0 unaltered.) > But if we align down on a 4-byte boundary and then access 4 bytes, we > can't cross a boundary. I agree that the algorithm is not correct, as > it can ignore some values in certain situations. For example: > 0 1 2 3 4 5 6 7 8 > X X X 1 1 X X X X > In this case, the value at address 4 won't be updated. > > I agree that introducing a new macro to check if a variable crosses a > boundary is necessary or as an option we can check that addr is 2-byte > aligned: > > #define CHECK_BOUNDARY_CROSSING(start, end, boundary_size) > ASSERT((start / boundary_size) != (end / boundary_size)) > > Then, it is necessary to check: > > CHECK_BOUNDARY_CROSSING(start, end, 4) > CHECK_BOUNDARY_CROSSING(start, end, PAGE_SIZE) > > But why do we need to check the cache line boundary? In the case of the > cache, the question will only be about performance, but it should still > work, shouldn't it? You don't need to check for any of these boundaries. You can simply leverage what the hardware does for misaligned accesses. See the various other replies I've sent - I thought things should have become pretty much crystal clear by now: For 1-byte accesses you access the containing word, by clearing the low two bits. For 2-byte accesses you also access the containing word, by clearing only bit 1 (which the naturally leaves no bit that needs clearing for the projected [but not necessary] case of handling a 4-byte access). If the resulting 4-byte access then is still misaligned, it'll fault just as a non- emulated 4-byte access would. And you don't need to care about any of the boundaries, not at words, not at cache lines, and not at pages. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.