Xen project Mailing List

Re: [RFC] Avoid dom0/HVM performance penalty from MSR access tightening

To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Alex Olson <this.is.a0lson@xxxxxxxxx>

Date: Wed, 23 Feb 2022 09:38:56 -0600

Delivery-date: Wed, 23 Feb 2022 15:39:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

I appreciate your interest, apologies for not replying right away. I've been digging deeper to have a more meaningful resposne. I had attempted to instrument the MSR reads, but only saw a small number reads being blocked by the code change. They appear to be the list below and the others seem fairly harmless: 0x00000034 MSR_SMI_COUNT 0x0000019a IA32_CLOCK_MODULATION/MSR_IA32_THERM_CONTROL MSR 0x000003f8 MSR_PKG_C3_RESIDENCY 0x000003f9 MSR_PKG_C6_RESIDENCY 0x000003fa MSR_PKG_C7_RESIDENCY 0x00000606 MSR_RAPL_POWER_UNIT 0x0000060d MSR_PKG_C2_RESIDENCY 0x00000611 MSR_PKG_ENERGY_STATUS 0x00000619 MSR_DRAM_ENERGY_STATUS 0x00000630 MSR_PKG_C8_RESIDENCY 0x00000631 MSR_PKG_C9_RESIDENCY 0x00000632 MSR_PKG_C10_RESIDENCY 0x00000639 MSR_PP0_ENERGY_STATUS 0x00000641 MSR_PP1_ENERGY_STATUS As for my test program, it is just a crude loop compiled with "gcc -O3", normally takes about 10 seconds to execute: int main() { for (volatile int i=1; i!=0; ++i){} return 0; } The relative changes in execution time of the test program and also that HVM guest startup time (associated with the "qemu" process being busy) completely agreed. I also observed the same changes under a PVH guest for the test program. Thus, it seemed like the CPU was somehow operating a different frequency than expected, rather than faults consuming execution time. -- (after a lot more investigation) -- Further instrumentation showed that the IA32_CLOCK_MODULATION/MSR_IA32_THERM_CONTROL MSR initially had value "0x10" which appears to be invalid both in the Intel Software Developer's manual and what I think I'm seeing in the ACPI tables. In dom0 Linux 5.2.38, this value seems to have caused the acpi_processor_get_throttling_ptc() function to see an invalid result from acpi_get_throttling_state() and thus execute __acpi_processor_set_throttling() which wrote the MSR with a value of zero and had the side effect of disabling throttling (restoring normal performance). (This all happened as the CPUs were detected). When the unknown MSR reads are blocked, the call to __acpi_processor_set_throttling() did not occur since the MSR read did not result in the invalid value -- thus the CPU remained in a throttling state. So far, this seems to explain the dom0 performance issues I saw. The domU observation was related... In some of my testing, dom0 was limited (via Xen command-line) to a small number of cores so that the others could be dedicated to other domains. When a domU VM was launched on the others (not used by dom0), its MSR remained at the original value resulting in low performance since dom0 hadn't a chance to rewrite it... Thus, I saw different domU behavior based on the number of cores allocated to dom0. -- summary -- In desparation, I ended up resetting BIOS settings to defaults and mysteriously this issue doesn't occur anymore. Not sure what could have gone wrong before as the original settings were not far from defaults. It seems my issues stemmed from the server's BIOS setting the throttling MSR to an invalid value but it had illuminated some unusual behaviors under Xen... It seems to me there are a few findings useful to the Xen developers from venturing down this rabbithole: 1) For conditions in which MSR registers are writeable from PV guests (such as dom0), they should probably be readable well, looks like MSR_IA32_THERM_CONTROL is currently one of a small number of "unreadable" but writeable MSRs. Otherwise seemingly valid read-(check/modify)-write operations will behave incorrectly under Xen. 2) As Xen controls CPU frequency and c-states, might there be benefit to it being extended to manage Clock Modulation / Throttling? (I wasn't expecting dom0 to be able to influence this!) 3) Perhaps PV domains (such as dom0) should not be allowed to modify such MSRs at all since it would result in unintended effects depending on how CPU pools and dom0 are managed? Regards, -Alex On Thu, 2022-02-10 at 18:27 +0000, Andrew Cooper wrote: > On 10/02/2022 17:27, Alex Olson wrote: > > I'm seeing strange performance issues under Xen on a Supermicro server with > > a Xeon D-1541 CPU caused by an MSR-related commit. > > > > Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to > > unknown MSRs' > > surprisingly introduces a severe performance penality where dom0 has about > > 1/8th > > the normal CPU performance. Even even when 'xenpm' is used to select the > > performance governor and operate the CPU at maximum frequency, actual CPU > > performance is still 1/2 of normal (as well as using > > "cpufreq=xen,performance"). > > > > The patch below fixes it but I don't fully understand why. > > > > Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and > > guests (pinned to other CPUs) see the performance issues. > > > > For benchmarking purposes, I built a small C program that runs a "for > > loop" > > 4Billion iterations and timed its execution. In dom0, the > > performance issues > > also cause HVM guest startup time to go from 9-10 > > seconds to almost 80 seconds. > > > > I assumed Xen was managing CPU frequency and thus blocking related MSR > > access by dom0 (or any other domain). However, clearly something else > > is happening and I don't understand why. > > > > I initially attempted to copy the same logic as the write MSR case. This > > was effective at fixing the dom0 performance issue, but still left other > > domains running at 1/2 speed. Hence, the change below has no access control. > > > > > > If anyone has any insight as to what is really happening, I would be all > > ears > > as I am unsure if the change below is a proper solution. > > Well that's especially entertaining... > > So your patch edits pv/emul-priv-op.c#read_msr(), so is only changing > the behaviour for PV dom0. > > What exactly is your small C program doing? > > > The change that that patch made was to turn a read which previously > succeeded into a #GP fault. > > The read has already been bogus, even if they appeared to work before. > When dom0 is scheduled around, it no longer knows which MSR it is > actually reading, so at the best, the data being read is racy as to > which CPU you're instantaneously scheduled on. > > > At a guess, something in Linux is doing something especially dumb when > given #GP and is falling into a tight loop of trying to read the MSR. > Do you happen to know which of those two is the more dominating factor? > > ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.