| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: [PATCH v4 01/17] perf: Protect perf_guest_cbs with RCU
 
To: Sean Christopherson <seanjc@xxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>, Will Deacon <will@xxxxxxxxxx>, Mark Rutland <mark.rutland@xxxxxxx>, Russell King <linux@xxxxxxxxxxxxxxx>, Marc Zyngier <maz@xxxxxxxxxx>, Catalin Marinas <catalin.marinas@xxxxxxx>, Guo Ren <guoren@xxxxxxxxxx>, Nick Hu <nickhu@xxxxxxxxxxxxx>, Greentime Hu <green.hu@xxxxxxxxx>, Vincent Chen <deanbo422@xxxxxxxxx>, Paul Walmsley <paul.walmsley@xxxxxxxxxx>, Palmer Dabbelt <palmer@xxxxxxxxxxx>, Albert Ou <aou@xxxxxxxxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, x86@xxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>From: Paolo Bonzini <pbonzini@xxxxxxxxxx>Date: Thu, 11 Nov 2021 08:26:58 +0100Authentication-results: relay.mimecast.com;	auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pbonzini@xxxxxxxxxxCc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>, Jiri Olsa <jolsa@xxxxxxxxxx>, Namhyung Kim <namhyung@xxxxxxxxxx>, James Morse <james.morse@xxxxxxx>, Alexandru Elisei <alexandru.elisei@xxxxxxx>, Suzuki K Poulose <suzuki.poulose@xxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>, Wanpeng Li <wanpengli@xxxxxxxxxxx>, Jim Mattson <jmattson@xxxxxxxxxx>, Joerg Roedel <joro@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, linux-perf-users@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, kvmarm@xxxxxxxxxxxxxxxxxxxxx, linux-csky@xxxxxxxxxxxxxxx, linux-riscv@xxxxxxxxxxxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Artem Kashkanov <artem.kashkanov@xxxxxxxxx>, Like Xu <like.xu.linux@xxxxxxxxx>, Like Xu <like.xu@xxxxxxxxxxxxxxx>, Zhu Lingshan <lingshan.zhu@xxxxxxxxx>Delivery-date: Thu, 11 Nov 2021 07:27:35 +0000List-id: Xen developer discussion <xen-devel.lists.xenproject.org> 
 
On 11/11/21 03:07, Sean Christopherson wrote:
 
Protect perf_guest_cbs with RCU to fix multiple possible errors.  Luckily,
all paths that read perf_guest_cbs already require RCU protection, e.g. to
protect the callback chains, so only the direct perf_guest_cbs touchpoints
need to be modified.
Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
Fixed via the READ_ONCE() in rcu_dereference().
Bug #2 is that on weakly-ordered architectures, updates to the callbacks
themselves are not guaranteed to be visible before the pointer is made
visible to readers.  Fixed by the smp_store_release() in
rcu_assign_pointer() when the new pointer is non-NULL.
Bug #3 is that, because the callbacks are global, it's possible for
readers to run in parallel with an unregisters, and thus a module
implementing the callbacks can be unloaded while readers are in flight,
resulting in a use-after-free.  Fixed by a synchronize_rcu() call when
unregistering callbacks.
Bug #1 escaped notice because it's extremely unlikely a compiler will
reload perf_guest_cbs in this sequence.  perf_guest_cbs does get reloaded
for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest()
guard all but guarantees the consumer will win the race, e.g. to nullify
perf_guest_cbs, KVM has to completely exit the guest and teardown down
all VMs before KVM start its module unload / unregister sequence.  This
also makes it all but impossible to encounter bug #3.
Bug #2 has not been a problem because all architectures that register
callbacks are strongly ordered and/or have a static set of callbacks.
But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
kvm_intel module load/unload leads to:
   BUG: kernel NULL pointer dereference, address: 0000000000000000
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] PREEMPT SMP
   CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
   RIP: 0010:perf_misc_flags+0x1c/0x70
   Call Trace:
    perf_prepare_sample+0x53/0x6b0
    perf_event_output_forward+0x67/0x160
    __perf_event_overflow+0x52/0xf0
    handle_pmi_common+0x207/0x300
    intel_pmu_handle_irq+0xcf/0x410
    perf_event_nmi_handler+0x28/0x50
    nmi_handle+0xc7/0x260
    default_do_nmi+0x6b/0x170
    exc_nmi+0x103/0x130
    asm_exc_nmi+0x76/0xbf
Fixes: 39447b386c84 ("perf: Enhance perf to allow for guest statistic collection 
from host")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
 
Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
One nit:
 
  EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs) 
  {
-       perf_guest_cbs = NULL;
+       if (WARN_ON_ONCE(rcu_access_pointer(perf_guest_cbs) != cbs))
+               return -EINVAL;
+
+       rcu_assign_pointer(perf_guest_cbs, NULL);
+       synchronize_rcu();
 
This technically could be RCU_INIT_POINTER but it's not worth a respin.
There are dozens of other occurrences, and if somebody wanted they
could use Coccinelle to fix all of them.
Paolo
 
 |