[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 0/3] Lockless SMP function call and TLB flushing


  • To: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 2 Apr 2026 13:57:00 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=27LXP7OPrzIQPYurdOksMR/ZzTLa2Avh0F8HbvZb1TA=; b=jyJRK80FxpJdVJZ1g6Z/sKX3VuZBjlDuRU9GGSNMRtJFYykTQxGBzj70QyTFZHLnzY1xFWmeQ3UkugID/OwM/VXiW7glBIe9x+zRDZ2bNMKT135Enkdo69iEmEvmyvm50mgUc9k50LRLWJ3XMWyKatIs3gYdJfDuZQpFai0P7hryArhAdIOIiuB/5+oHY+F5/6P2me+aeMeLPRIeSEnU2uQjtx5KQ2NkiuElaM91qLc8Kt3HYlV+UxXk6v4uRFP4oVTO8ggHfjUMyG6DFR26xpgYwB8CNofWexcol1Byd9pEE5pRZ/zxOK9aHKkpnQ5jKF42/0TIoaEJsOIxbcS5zQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=NmEWLLEf31rgIxCWd1Vsg1xGywKhwuEHS75pmuj+2ROtkiSPVGjf7G3hweX6v/+qvUbf/618N5PKxTuBdOUgRWGwcBg1VYaiXlF4Yc+MoUkvUDjifcezIq2DbBNNe7TmKWssBimLfHO1NWUz62iSzvJRzK8nNU/Sm5gA2AbLp8bdiqDqWaYP+9ADC7cx7g6kE+5R6wR5oOxSkeU2KlYGoe/Y2gdrLgLTmL0C3SOUgbp8PxOFy5t5v1+VYCieV5BMpwVPASkBl5S77MLCSx57XmWH3sL5kG+xcCKSFdL/MQCDKZqG0DpbBGpb5PFhmTOI7NsKEhDw1+DL1JgndWarKQ==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=citrix.com header.i="@citrix.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck"
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 02 Apr 2026 11:57:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02/04/2026 12:57 pm, Ross Lagerwall wrote:
> On 4/2/26 9:49 AM, Jan Beulich wrote:
>> On 02.04.2026 10:40, Ross Lagerwall wrote:
>>> On 4/2/26 7:09 AM, Jan Beulich wrote:
>>>> On 01.04.2026 18:35, Ross Lagerwall wrote:
>>>>> We have observed that the TLB flush lock can be a point of
>>>>> contention for
>>>>> certain workloads, e.g. migrating 10 VMs off a host during a host
>>>>> evacuation.
>>>>>
>>>>> Performance numbers:
>>>>>
>>>>> I wrote a synthetic benchmark to measure the performance. The
>>>>> benchmark has one
>>>>> or more CPUs in Xen calling on_selected_cpus() with between 1 and
>>>>> 64 CPUs in
>>>>> the selected mask. The executed function simply delays for 500
>>>>> microseconds.
>>>>>
>>>>> The table below shows the % change in execution time of
>>>>> on_selected_cpus():
>>>>>
>>>>>                     1 thread   2 threads    4 threads
>>>>> 1 CPU in mask     0.02       -35.23       -51.18
>>>>> 2 CPUs in mask    0.01       -47.20       -69.27
>>>>> 4 CPUs in mask    -0.02      -42.40       -66.55
>>>>> 8 CPUs in mask    -0.03      -47.82       -68.39
>>>>> 16 CPUs in mask   0.12       -41.95       -58.26
>>>>> 32 CPUs in mask   0.02       -25.43       -39.35
>>>>> 64 CPUs in mask   0.00       -24.70       -37.83
>>>>>
>>>>> With 1 thread (i.e. no contention), there is no regression in
>>>>> execution time.
>>>>> With multiple threads, as expected there is a significant
>>>>> improvement in
>>>>> execution time.
>>>>>
>>>>> As a more practical benchmark to simulate host evacuation, I
>>>>> measured the
>>>>> memory dirtying rate across 10 VMs after enabling log dirty (on an
>>>>> AMD system,
>>>>> so without PML). The rate increased by 16% with this patch series,
>>>>> even
>>>>> after the recent deferred TLB flush changes.
>>>>
>>>> Is this a positive thing though? In the context of some related
>>>> work something
>>>> similar was mentioned iirc, accompanied by stating that this is
>>>> actually
>>>> problematic. A guest in log-dirty mode generally wants to be making
>>>> progress,
>>>> but also wants to be throttled enough to limit re-dirtying, such that
>>>> subsequent iterations (in particular the final one) of page contents
>>>> migration won't have to process overly many pages a 2nd time.
>>>
>>> In the context of a real migration, both the process copying the pages
>>> out of the guest and the guest itself will be hitting the TLB flush
>>> lock
>>> so reducing that bottleneck may increase throughput on both sides.
>>> Whether or not the overall migration time increases or decreases
>>> depends
>>> on many factors (number of migrations in parallel, the rate the
>>> guest is
>>> dirtying memory, the line speed of the NIC, whether PML is used, ...)
>>> which is why I measured a more controlled scenario to demonstrate the
>>> change.
>>>
>>> IMO throttling of a guest during a migration should be something
>>> intentional and controlled by userspace policy rather than a side
>>> effect
>>> of some internal global locks.
>>
>> I definitely agree here, but side effects going away may make it
>> necessary to
>> add such explicit throttling.
>>
>
> Explicit throttling is much more important for the already existing
> case of Intel systems with PML. With log dirty enabled, a VM on an Intel
> system can dirty memory an order of magnitude faster than an AMD system
> without PML.
>
> As an aside, for the same test an Intel machine without PML is still a
> lot faster than AMD so there is probably something to improve in this
> area for AMD machines. 

AMD have PML on the way. 
https://docs.amd.com/v/u/en-US/69208_1.00_AMD64_PML_PUB

There is a mis-step with how support for Intel's PML is done, meaning
that draining the vCPU's PML buffers is extraordinarily expensive even
when there's no action to take.  (Specifically, the remote VMCS acquire)

A better option is this:  When logdirty is active, any VMExit will drain
the PML buffer into the logdirty bitmap before processing the main exit
reason.  This way, you drain all the PML buffers by just IPI-ing the
domain dirty mask.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.