|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86: prefer RDTSCP in rdtsc_ordered()
On 30.09.2024 18:40, Andrew Cooper wrote:
> On 30/09/2024 4:08 pm, Jan Beulich wrote:
>> If available, its use is supposed to be cheaper than LFENCE+RDTSC, and
>> is virtually guaranteed to be cheaper than MFENCE+RDTSC.
>>
>> Unlike in rdtsc() use 64-bit local variables, eliminating the need for
>
> I'd drop this reference to rdtsc() seeing as you adjust it in a parallel
> patch.
Already done, with that other commit now having gone in. When I wrote this,
I wasn't finally decided yet whether to also make that other adjustment.
>> --- a/xen/arch/x86/include/asm/msr.h
>> +++ b/xen/arch/x86/include/asm/msr.h
>> @@ -108,18 +108,30 @@ static inline uint64_t rdtsc(void)
>>
>> static inline uint64_t rdtsc_ordered(void)
>> {
>> - /*
>> - * The RDTSC instruction is not ordered relative to memory access.
>> - * The Intel SDM and the AMD APM are both vague on this point, but
>> - * empirically an RDTSC instruction can be speculatively executed
>> - * before prior loads. An RDTSC immediately after an appropriate
>> - * barrier appears to be ordered as a normal load, that is, it
>> - * provides the same ordering guarantees as reading from a global
>> - * memory location that some other imaginary CPU is updating
>> - * continuously with a time stamp.
>> - */
>> - alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC);
>> - return rdtsc();
>> + uint64_t low, high, aux;
>> +
>> + /*
>> + * The RDTSC instruction is not ordered relative to memory access.
>> + * The Intel SDM and the AMD APM are both vague on this point, but
>> + * empirically an RDTSC instruction can be speculatively executed
>> + * before prior loads.
>
> This part of the comment is stale now. For RDTSC, AMD state:
>
> "This instruction is not serializing. Therefore, there is no guarantee
> that all instructions have completed at the time the time-stamp counter
> is read."
>
> and for RDTSCP:
>
> "Unlike the RDTSC instruction, RDTSCP forces all older instructions to
> retire before reading the time-stamp counter."
>
> i.e. it's dispatch serialising, given our new post-Spectre terminology.
I don't read that as truly "dispatch serializing"; both Intel and AMD
leave open whether subsequent insns would also be affected, or whether
those could pass the RDTSCP. Either form is fine for our purposes here
aiui.
> Intel OTOH have much more extensive information. For RDTSC:
>
> The RDTSC instruction is not a serializing instruction. It does not
> necessarily wait until all previous instructions have been executed
> before reading the counter. Similarly, subsequent instructions may begin
> execution before the read operation is performed. The following items
> may guide software seeking to order executions of RDTSC:
>
> •If software requires RDTSC to be executed only after all previous
> instructions have executed and all previous loads are globally visible,1
> it can execute LFENCE immediately before RDTSC.
>
> •If software requires RDTSC to be executed only after all previous
> instructions have executed and all previous loads and stores are
> globally visible, it can execute the sequence MFENCE;LFENCE immediately
> before RDTSC.
>
> •If software requires RDTSC to be executed prior to execution of any
> subsequent instruction (including any memory accesses), it can execute
> the sequence LFENCE immediately after RDTSC.
>
> Similarly, for RDTSCP:
>
> The RDTSCP instruction is not a serializing instruction, but it does
> wait until all previous instructions have executed and all previous
> loads are globally visible. But it does not wait for previous stores to
> be globally visible, and subsequent instructions may begin execution
> before the read operation is performed. The following items may guide
> software seeking to order executions of RDTSCP:
>
> •If software requires RDTSCP to be executed only after all previous
> stores are globally visible, it can execute MFENCE immediately before
> RDTSCP.
>
> •If software requires RDTSCP to be executed prior to execution of any
> subsequent instruction (including any memory accesses), it can execute
> LFENCE immediately after RDTSCP.
>
>
>
> I'd delete most of the paragraph, and just state the recommendation to
> use LFENCE.
I was in fact wondering whether to. I'll send a v2 with updated (and
shortened) commentary. I think I will want to keep mentioning MFENCE
there though, ...
> In truth, X86_FEATURE_MFENCE_RDTSC is useless now that we unilaterally
> activate LFENCE_DISPATCH on CPUs where it's optional. Linux went as far
> as removing the case entirely, because if you're running under a
> hypervisor which hasn't set LFENCE_DISPATCH, then the misbehaviour of
> lfence;rdtsc is the least of your problems.
... despite this (orthogonal) observation. We can independently decide
whether to drop MFENCE_RDTSC.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |