[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/vpmu: fix race-condition in vpmu_load


  • To: Tamas K Lengyel <tamas.k.lengyel@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 19 Sep 2022 15:21:41 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SIWZ0/lDECLxy5/HW67iD21PHxulpsqP+pyianc6Py8=; b=kU6DMMS1E8natU8SEHg/XX7No96vZTyxYOAM0b/9NWk36+EUBOt93uaobLhz8Cnh/OaWGx/QZGBASO+HLC8ZrKyeDEV1wwhJQAy7khWT3rRKr22yAbTU8GibrHTfAveugL943o6hv+O/a1h2ZyDvTmDNCpX684zic3UUqCRvjzWzSlHN8r/+i1jm4QJlKmFQILH6rLaqRj96f2K3gNsqYBOHs4timRR8+1vL5KNIyHVDnMLEqLL2vlwbYND7HnyZTgvESwtmKunlT+8aXzV/iperXZOjd3N2kjYkyCeJv1RQSvbjv77cnSCmuL5EJQ5hzgIjUu+M7Ctqk5/k7SdT3g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Os4mcu8msuIl8+5V9jnsGl39VNdWX2up76anL7T7dTbC/Eh5ujwN4ABQcHtTUGoxp50Fn+B9l7ts7ds2awi4CoSe0J5jP/O+Kd+75EdZ3V5ZnnHsNW5pFABCWbey02ukbC+v40FraMb9/80zppCD0Eq7xmcqNIoIysaaJ+mbi/3GcppWrREFsjBUfFWKCK6MwIS6INPwXefe2g9V28Rz57zn5hVoDinxcLwkW6Dr7i4PvUZSPlLkKGSb813czkenSXGzHJPg/3L95IwUlbbieRmiCJtVOXTYyw9rmGHBASjMXQNJf5Eaq445c7S0YAeMiQveVTz1R1IQRGgyEFAqXA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Tamas K Lengyel <tamas.lengyel@xxxxxxxxx>
  • Delivery-date: Mon, 19 Sep 2022 13:21:56 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19.09.2022 14:25, Tamas K Lengyel wrote:
> On Mon, Sep 19, 2022 at 5:28 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>>
>> On 16.09.2022 23:35, Boris Ostrovsky wrote:
>>>
>>> On 9/16/22 8:52 AM, Jan Beulich wrote:
>>>> On 15.09.2022 16:01, Tamas K Lengyel wrote:
>>>>> While experimenting with the vPMU subsystem an ASSERT failure was
>>>>> observed in vmx_find_msr because the vcpu_runnable state was true.
>>>>>
>>>>> The root cause of the bug appears to be the fact that the vPMU subsystem
>>>>> doesn't save its state on context_switch.

For the further reply below - is this actually true? What is the
vpmu_switch_from() (resolving to vpmu_save()) doing then early
in context_switch()?

>>>>> The vpmu_load function will attempt
>>>>> to gather the PMU state if its still loaded two different ways:
>>>>>      1. if the current pcpu is not where the vcpu ran before doing a 
>>>>> remote save
>>>>>      2. if the current pcpu had another vcpu active before doing a local 
>>>>> save
>>>>>
>>>>> However, in case the prev vcpu is being rescheduled on another pcpu its 
>>>>> state
>>>>> has already changed and vcpu_runnable is returning true, thus #2 will 
>>>>> trip the
>>>>> ASSERT. The only way to avoid this race condition is to make sure the
>>>>> prev vcpu is paused while being checked and its context saved. Once the 
>>>>> prev
>>>>> vcpu is resumed and does #1 it will find its state already saved.
>>>> While I consider this explanation plausible, I'm worried:
>>>>
>>>>> --- a/xen/arch/x86/cpu/vpmu.c
>>>>> +++ b/xen/arch/x86/cpu/vpmu.c
>>>>> @@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
>>>>>           vpmu = vcpu_vpmu(prev);
>>>>>
>>>>>           /* Someone ran here before us */
>>>>> +        vcpu_pause(prev);
>>>>>           vpmu_save_force(prev);
>>>>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>>>>> +        vcpu_unpause(prev);
>>>>>
>>>>>           vpmu = vcpu_vpmu(v);
>>>>>       }
>>>> We're running with IRQs off here, yet vcpu_pause() waits for the vcpu
>>>> to actually be de-scheduled. Even with IRQs on this is already a
>>>> relatively heavy operation (also including its impact on the remote
>>>> side). Additionally the function is called from context_switch(), and
>>>> I'm unsure of the usability of vcpu_pause() on such a path. In
>>>> particular: Is there a risk of two CPUs doing this mutually to one
>>>> another? If so, is deadlocking excluded?
>>>>
>>>> Hence at the very least I think the description wants extending, to
>>>> discuss the safety of the change.
>>>>
>>>> Boris - any chance you could comment here? Iirc that's code you did
>>>> introduce.
>>>
>>>
>>> Is the assertion in vmx_find_msr() really needs to be for runnable vcpu or 
>>> can it be a check on whether vcpu is actually running (e.g. 
>>> RUNSTATE_running)?
>>
>> You cannot safely check for "running", as "runnable" may transition
>> to/from "running" behind your back.
> 
> The more I look at this the more I think the only sensible solution is
> to have the vPMU state be saved on vmexit for all vCPUs.

Do you really mean vmexit? It would suffice if state was reliably
saved during context-switch-out, wouldn't it? At that point the
vCPU can't be resumed on another pCPU, yet.

> That way all
> this having to figure out where and when a context needs saving during
> scheduling goes away. Yes, it adds a bit of overhead for cases where
> the vCPU will resume on the same pCPU and that context saved could
> have been skipped,

If you really mean vmexit, then I'm inclined to say that's more
than just "a bit of overhead". I'd agree if you really meant
context-switch-out, but as said further up it looks to me as if
that was already occurring. Apparently I'm overlooking something
crucial ...

Jan

> but it makes it so that the vCPU can be resumed on
> any pCPU without having to have evidently fragile checks that may
> potentially lead to deadlocks (TBH I don't know if that's a real
> concern at the moment because the current setup is very hard to reason
> about). We can still keep track if the context needs reloading from
> the saved context and skip that if we know the state is still active.
> Any objection to that change in light of these issues?
> 
> Tamas




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.