[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 19.02.2024 15:00, Oleksii wrote: > On Sun, 2024-02-18 at 19:00 +0000, Julien Grall wrote: >> On 05/02/2024 15:32, Oleksii Kurochko wrote: >>> --- /dev/null >>> +++ b/xen/arch/riscv/include/asm/cmpxchg.h >>> @@ -0,0 +1,237 @@ >>> +/* SPDX-License-Identifier: GPL-2.0-only */ >>> +/* Copyright (C) 2014 Regents of the University of California */ >>> + >>> +#ifndef _ASM_RISCV_CMPXCHG_H >>> +#define _ASM_RISCV_CMPXCHG_H >>> + >>> +#include <xen/compiler.h> >>> +#include <xen/lib.h> >>> + >>> +#include <asm/fence.h> >>> +#include <asm/io.h> >>> +#include <asm/system.h> >>> + >>> +#define ALIGN_DOWN(addr, size) ((addr) & (~((size) - 1))) >>> + >>> +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier, >>> acquire_barrier) \ >>> +({ \ >>> + asm volatile( \ >>> + release_barrier \ >>> + " amoswap" sfx " %0, %2, %1\n" \ >>> + acquire_barrier \ >>> + : "=r" (ret), "+A" (*ptr) \ >>> + : "r" (new) \ >>> + : "memory" ); \ >>> +}) >>> + >>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, >>> acquire_barrier) \ >>> +({ \ >>> + uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned >>> long)ptr, 4); \ >>> + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr))) >>> * BITS_PER_BYTE; \ >>> + uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \ >>> + uint8_t mask_h = mask_l + mask_size - 1; \ >>> + unsigned long mask = GENMASK(mask_h, mask_l); \ >>> + unsigned long new_ = (unsigned long)(new) << mask_l; \ >>> + unsigned long ret_; \ >>> + unsigned long rc; \ >>> + \ >>> + asm volatile( \ >>> + release_barrier \ >>> + "0: lr.d %0, %2\n" \ >> >> I was going to ask why this is lr.d rather than lr.w. But I see Jan >> already asked. I agree with him that it should probably be a lr.w and >> ... >> >>> + " and %1, %0, %z4\n" \ >>> + " or %1, %1, %z3\n" \ >>> + " sc.d %1, %1, %2\n" \ >> >> ... respectively sc.w. The same applies for cmpxchg. > > I agree that it would be better, and my initial attempt was to handle > 4-byte or 8-byte boundary crossing during 2-byte access: > > 0 1 2 3 4 5 6 7 8 > X X X 1 1 X X X X > > In this case, if I align address 3 to address 0 and then read 4 bytes > instead of 8 bytes, I will not process the byte at address 4. This was > the reason why I started to read 8 bytes. > > I also acknowledge that there could be an issue in the following case: > > X 4094 4095 4096 > X 1 1 X > In this situation, when I read 8 bytes, there could be an issue where > the new page (which starts at 4096) will not be mapped. It seems > correct in this case to check that variable is within one page and read > 4 bytes instead of 8. > > One more thing I am uncertain about is if we change everything to read > 4 bytes with 4-byte alignment, what should be done with the first case? > Should we panic? (I am not sure if this is an option.) Counter question (raised elsewhere already): What if a 4-byte access crosses a word / cache line / page boundary? Ideally exactly the same would happen for a 2-byte access crossing a respective boundary. (Which you can achieve relatively easily by masking off only address bit 1, keeping address bit 0 unaltered.) > Should we > perform the operation twice for addresses 0x0 and 0x4? That wouldn't be atomic then. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |