[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load


  • To: Tamas K Lengyel <tamas.lengyel@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 16 Sep 2022 14:52:56 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kPA0VsTyaym7mRLLFMQiZ3Dvj3gi9SUOgJRwrtmv2jY=; b=ZOTFGqkwHwnduhyx0DFL5I6RyQMxSpwzMvF16MNgkoRgXt4KASono+v7kcUWYihEUc2JVBBNqwtNQTBF/mtkP++5VoAbeV5M8f0tYmZAvR+25VIkADFc4kJ+S/Sf8HpJuP+9HTkzDIYLgmUx3+6xd/DYX9+wMwfggC57HlyLs97utSQdUUcxsb7brWuCwMBqMEC/UBEdA8VkqMpDCCBa6FQvQTJ5BKRzJjsoF1lPRZtfPYrk4W59YUXTwsXGzY7mD5VaF2TtIHWKiBXRugF5R0lCmtu0e41NUbGTe2ErTqV7siwEatj4zKg+E/AZzIueUBwv+6SU+dD2G9YTJ3V4gQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=agOuRW/9IhIFQSevEYdRVuKVYvTj4wl1OwUgCSDxQKELbiHmoTQcOyO3QTpTnI0kKx3f6b4tCYl8X/RWaUmI8FwAVDktJetzJ0VPAacacf9fYnVCsVLfH+yIbR6p+tbKO3GxBRr8L+CZM4WMieEHdSiU4fqXEAzGiOSajt+/sc8N9SSGNgi6mEnarasoZilOcSGv1FGKjR7HdqFNyhdpGPvhT9QGI9hjrQ0boRJF7Nl3thwzxaA01o8+jz/pbP8oAg1vKwjDc65f5S5/8ie7J9fa5nnkYveWPGk5FbDbZXB8SxzsEcL9vWHNI0O+bg+RaGK8U5FE1imM+cnij86nbA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 16 Sep 2022 12:53:24 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15.09.2022 16:01, Tamas K Lengyel wrote:
> While experimenting with the vPMU subsystem an ASSERT failure was
> observed in vmx_find_msr because the vcpu_runnable state was true.
> 
> The root cause of the bug appears to be the fact that the vPMU subsystem
> doesn't save its state on context_switch. The vpmu_load function will attempt
> to gather the PMU state if its still loaded two different ways:
>     1. if the current pcpu is not where the vcpu ran before doing a remote 
> save
>     2. if the current pcpu had another vcpu active before doing a local save
> 
> However, in case the prev vcpu is being rescheduled on another pcpu its state
> has already changed and vcpu_runnable is returning true, thus #2 will trip the
> ASSERT. The only way to avoid this race condition is to make sure the
> prev vcpu is paused while being checked and its context saved. Once the prev
> vcpu is resumed and does #1 it will find its state already saved.

While I consider this explanation plausible, I'm worried:

> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
>          vpmu = vcpu_vpmu(prev);
>  
>          /* Someone ran here before us */
> +        vcpu_pause(prev);
>          vpmu_save_force(prev);
>          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> +        vcpu_unpause(prev);
>  
>          vpmu = vcpu_vpmu(v);
>      }

We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
to actually be de-scheduled. Even with IRQs on this is already a
relatively heavy operation (also including its impact on the remote
side). Additionally the function is called from context_switch(), and
I'm unsure of the usability of vcpu_pause() on such a path. In
particular: Is there a risk of two CPUs doing this mutually to one
another? If so, is deadlocking excluded?

Hence at the very least I think the description wants extending, to
discuss the safety of the change.

Boris - any chance you could comment here? Iirc that's code you did
introduce.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.