In order for a to-be-introduced ERMS form of memcpy() to not regress
boot performance on certain systems when video output is active, we
first need to arrange for avoiding further dependency on firmware
setting up MTRRs in a way we can actually further modify. On many
systems, due to the continuously growing amounts of installed memory,
MTRRs get configured with at least one huge WB range, and with MMIO
ranges below 4Gb then forced to UC via overlapping MTRRs. mtrr_add(), as
it is today, can't deal with such a setup. Hence on such systems we
presently leave the frame buffer mapped UC, leading to significantly
reduced performance when using REP STOSB / REP MOVSB.
On post-PentiumII hardware (i.e. any that's capable of running 64-bit
code), an effective memory type of WC can be achieved without MTRRs, by
simply referencing the respective PAT entry from the PTEs. While this
will leave the switch to ERMS forms of memset() and memcpy() with
largely unchanged performance, the change here on its own improves
performance on affected systems quite significantly: Measuring just the
individual affected memcpy() invocations yielded a speedup by a factor
of over 250 on my initial (Skylake) test system. memset() isn't getting
improved by as much there, but still by a factor of about 20.
While adding {__,}PAGE_HYPERVISOR_WC, also add {__,}PAGE_HYPERVISOR_WT
to, at the very least, make clear what PTE flags this memory type uses.
Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
v2: Mark ioremap_wc() __init.
---
TBD: If the VGA range is WC in the fixed range MTRRs, reusing the low
1st Mb mapping (like ioremap() does) would be an option.
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5881,6 +5881,20 @@ void __iomem *ioremap(paddr_t pa, size_t
return (void __force __iomem *)va;
}
+void __iomem *__init ioremap_wc(paddr_t pa, size_t len)
+{
+ mfn_t mfn = _mfn(PFN_DOWN(pa));
+ unsigned int offs = pa & (PAGE_SIZE - 1);
+ unsigned int nr = PFN_UP(offs + len);
+ void *va;
+
+ WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
+
+ va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_WC, VMAP_DEFAULT);
+
+ return (void __force __iomem *)(va + offs);
+}