[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Tamas K Lengyel <tamas.lengyel@xxxxxxxxx>
  • From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • Date: Fri, 16 Sep 2022 17:35:12 -0400
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=emhWqIyC7PTSoMg/ElSrz0uYX6lmAr6CSIuPDXtAY+0=; b=UIVHpgfFra5d85bbZ35P10sHgxVYusGy7XriQjd9L05Fi9uuCJ+2dPxoPbDbORByYkORzYSRguqwNJN/WI8GCsql07mxUwdngEFmq7uw+wJgELBbkc6vd2Qt9DNizkLAiO5u2uJ3l+fC04+C4/YsqEyWHWwqdjq1QlMCBtnpsvdEyEDL1/gwAkDQA3D+am1CDRdnKj3OAI0Xysb0LDYq9al2AAZeFupCGtenjCF7gsb4vdHgnUt07IQYoV2Lw3Dn9cKKHZUd7H9vzLGU/vbWhxFS5SxSvdykka+z+xfon6joWGU6fmSe/ngo0LQ6q1t4ZD6iMJa0cUzvh1SmTju+Dw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cNry4+EKkKSOKiDLSUOje3tphoMzaMCdRTo1M7NSCD5Okk0rawcqBYzmD61IJO37wJyQTcjWmD/XUpUOtVOS2kktDO+d6VvVbeQNs08E2QHBsKZJ4RNHOoBMJiUFSIn1i80UNWWG6FrNarF997UbA0P6UxM37gzpSYnwt90ixbuJxIpdWi3fZVPL2XHT/TqdF02TZXOKYIVHE/vgCX0LHyf32ZdSuzBBClKlsiu3kFbXd0P6g25OXX+ixsEMY+iwSTp7xZk8zgTmCQj7u3tkyCnPc73t/3qN94fZvmCXuy/CPjAPyLuiKZ4ItgG1XMyuYPu/KSPp+WP1+8DrhFk4dQ==
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 16 Sep 2022 21:35:31 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 9/16/22 8:52 AM, Jan Beulich wrote:
On 15.09.2022 16:01, Tamas K Lengyel wrote:
While experimenting with the vPMU subsystem an ASSERT failure was
observed in vmx_find_msr because the vcpu_runnable state was true.

The root cause of the bug appears to be the fact that the vPMU subsystem
doesn't save its state on context_switch. The vpmu_load function will attempt
to gather the PMU state if its still loaded two different ways:
     1. if the current pcpu is not where the vcpu ran before doing a remote save
     2. if the current pcpu had another vcpu active before doing a local save

However, in case the prev vcpu is being rescheduled on another pcpu its state
has already changed and vcpu_runnable is returning true, thus #2 will trip the
ASSERT. The only way to avoid this race condition is to make sure the
prev vcpu is paused while being checked and its context saved. Once the prev
vcpu is resumed and does #1 it will find its state already saved.
While I consider this explanation plausible, I'm worried:

--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
          vpmu = vcpu_vpmu(prev);
/* Someone ran here before us */
+        vcpu_pause(prev);
          vpmu_save_force(prev);
          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+        vcpu_unpause(prev);
vpmu = vcpu_vpmu(v);
      }
We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
to actually be de-scheduled. Even with IRQs on this is already a
relatively heavy operation (also including its impact on the remote
side). Additionally the function is called from context_switch(), and
I'm unsure of the usability of vcpu_pause() on such a path. In
particular: Is there a risk of two CPUs doing this mutually to one
another? If so, is deadlocking excluded?

Hence at the very least I think the description wants extending, to
discuss the safety of the change.

Boris - any chance you could comment here? Iirc that's code you did
introduce.


Is the assertion in vmx_find_msr() really needs to be for runnable vcpu or can it be a 
check on whether vcpu is actually running (e.g. RUNSTATE_running)? I think call chain 
vpmu_load()->..->*_vpmu_save()->...->vmx_find_msr() is the only one where we 
are doing it for non-current vcpu. If we can guarantee that run state is set after 
vpmu_load() call (maybe it is already, I haven't checked) then testing the state may avoid 
the assertion.


-boris




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.