[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 2/6] xen/x86: restore (fix) xen_set_pte_init() behavior

  • To: Juergen Gross <jgross@xxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 30 Sep 2021 14:35:22 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=eWRvbp7ta3owYpcdKj7PO7T0r2HXZ7zs9WVfAD35Uvw=; b=YsImSKohjx7Tx5XUvZCvBF0KUB/FrYqV17rGkQVV+9k75XwtAkVl8oe8qcsDUAD0lvstAt1+cYkDhiTXIoYMHE0DnpWyPbXLceIXXlEi0QICl3TcgypmmsBapKE59xmxL/anh/vyg+M0BjWzHKOrgvll6/aMGFQLIyEYnl7dw5tt0y/rd8xPZQI/GwpYH6WK2E7BLRFMCOnsMvRps7GY0/EMEmflRe7xvtpwfqbUb0Vc9MeFe+lAtlwTkSwR/4uxHmbgoZvgRIZjhb3tYtq9STA4tIjUjf9s+SJCOsrclV63PQ48ZXwH/Myqx2KVt1iZhJ0XmcF0nv6BARaYO4wCbA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gHkFibb5LYshIVpl0Jp9XnQZEGUsmW7OWVWVt0BujZRFHmCG+Bc50cgQqtzeRjcbiYLLLyPp0v7NQqVfrtBiNOqdMjnNKS6XkdwG0hx0DPY9PKuUk97H+nWKvlabCeMYG0nO6Cv3dX40QvfHKH3weYlGrh0gofHGLkrCYCtRRdQpVpZyWSlxBSEjB/JHVZyboqadYFIGEYsDJIR/N9YdBetu/2osTnEew84x+VfLMq3lcMVRoDoEK+7uwwI/2+I3z7XD5GHGfSHJTOznViGOtsnJKSWNmsn7GF/fhTZiPrN19YwZsuiqmddGoZr/i4zUXs4sWaQ8xhoRPc6WEvFh8w==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, lkml <linux-kernel@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 30 Sep 2021 12:35:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Commit f7c90c2aa400 ("x86/xen: don't write ptes directly in 32-bit PV
guests") needlessly (and heavily) penalized 64-bit guests here: The
majority of the early page table updates is to writable pages (which get
converted to r/o only after all the writes are done), in particular
those involved in building the direct map (which consists of all 4k
mappings in PV). On my test system this accounts for almost 16 million
hypercalls when each could simply have been a plain memory write.

Switch back to using native_set_pte(), except for updates of early
ioremap tables (where a suitable accessor exists to recognize them).
With 32-bit PV support gone, this doesn't need to be further
conditionalized (albeit backports thereof may need adjustment).

To avoid a fair number (almost 256k on my test system) of trap-and-
emulate cases appearing as a result, switch the hook in

Finally commit d6b186c1e2d8 ("x86/xen: avoid m2p lookup when setting
early page table entries") inserted a function ahead of
xen_set_pte_init(), separating it from its comment (which may have been
part of the reason why the performance regression wasn't anticipated /
recognized while codeing / reviewing the change mentioned further up).
Move the function up and adjust that comment to describe the new

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1194,6 +1194,13 @@ static void __init xen_pagetable_p2m_set
 static void __init xen_pagetable_init(void)
+       /*
+        * The majority of further PTE writes is to pagetables already
+        * announced as such to Xen. Hence it is more efficient to use
+        * hypercalls for these updates.
+        */
+       pv_ops.mmu.set_pte = __xen_set_pte;
@@ -1422,10 +1429,18 @@ static void xen_pgd_free(struct mm_struc
  * Many of these PTE updates are done on unpinned and writable pages
  * and doing a hypercall for these is unnecessary and expensive.  At
- * this point it is not possible to tell if a page is pinned or not,
- * so always write the PTE directly and rely on Xen trapping and
+ * this point it is rarely possible to tell if a page is pinned, so
+ * mostly write the PTE directly and rely on Xen trapping and
  * emulating any updates as necessary.
+static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
+       if (unlikely(is_early_ioremap_ptep(ptep)))
+               __xen_set_pte(ptep, pte);
+       else
+               native_set_pte(ptep, pte);
 __visible pte_t xen_make_pte_init(pteval_t pte)
        unsigned long pfn;
@@ -1447,11 +1462,6 @@ __visible pte_t xen_make_pte_init(pteval
-static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
-       __xen_set_pte(ptep, pte);
 /* Early in boot, while setting up the initial pagetable, assume
    everything is pinned. */
 static void __init xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn)



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.