Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/AMD: Add support for AMD's OSVW feature in guests

To: "Boris Ostrovsky" <boris.ostrovsky@xxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 18 Jan 2012 09:50:42 +0000

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 18 Jan 2012 09:51:30 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

>>> On 17.01.12 at 18:54, Boris Ostrovsky <boris.ostrovsky@xxxxxxx> wrote: > On 01/17/12 03:26, Jan Beulich wrote: >>>>> On 16.01.12 at 22:11, Boris Ostrovsky<boris.ostrovsky@xxxxxxx> wrote: >>> --- a/tools/libxc/xc_cpuid_x86.c Mon Jan 09 16:01:44 2012 +0100 >>> +++ b/tools/libxc/xc_cpuid_x86.c Mon Jan 16 22:08:09 2012 +0100 >>> @@ -108,6 +108,7 @@ static void amd_xc_cpuid_policy( >>> bitmaskof(X86_FEATURE_SSE4A) | >>> bitmaskof(X86_FEATURE_MISALIGNSSE) | >>> bitmaskof(X86_FEATURE_3DNOWPREFETCH) | >>> + bitmaskof(X86_FEATURE_OSVW) | >> >> Indentation. >> >> Also, is this really meaningful to PV guests? > > Does amd_xc_cpuid_policy() get called for PV guests? I thought this is > HVM path only. In that case I simply misplaced the comment. > xc_cpuid_pv_policy() is where clear_bit(X86_FEATURE_OSVW, regs[2]) was > removed to handle PV. > > I believe this (i.e. OSVW changes) is meaningful for PV. Taking erratum > 400 as an example -- we don't need a Linux PV guest reading an MSR > before going to idle (in amd_e400_idle()). It is bogus in the first place if a pv guest does so - after all, not doing stuff like this is the nature of being pv. >> And valid for pre-Fam10? > > No, it indeed shouldn't be set in those cases. > > I could set the bit conditionally, based on host family but the trouble > is I don't necessarily know what family the guest will be. Or is this > known at this point? I think basing this on the host family is a good first shot. I would hope others could comment on how to properly deal with a dependency like this... >>> --- a/xen/arch/x86/cpu/amd.c Mon Jan 09 16:01:44 2012 +0100 >>> +++ b/xen/arch/x86/cpu/amd.c Mon Jan 16 22:08:09 2012 +0100 >>> @@ -32,6 +32,13 @@ >>> static char opt_famrev[14]; >>> string_param("cpuid_mask_cpu", opt_famrev); >>> >>> +/* >>> + * Set osvw_len to higher number when updated Revision Guides >>> + * are published and we know what the new status bits are >>> + */ >> >> This is ugly, as it requires permanent attention. The value to start >> with should really be 64 (beyond which other code adjustments are >> going to be necessary anyway). > > I went back and forth on this. The reason I settled on 4 is because we > obviously don't know what errata the higher bits will cover and because > it is (at least theoretically) possible that a guest shouldn't see the > same status bits as the host. I think allowing the guest to know of (not) worked around errata is even desirable (aiui in the worst case it'll apply a redundant fix). >>> + vcpu->arch.amd.osvw.status = osvw_status& ~(6ULL); >>> + >>> + /* >>> + * By increasing VCPU's osvw.length to 3 we are telling the guest that >>> + * all osvw.status bits inside that length, including bit 0 (which is >>> + * reserved for erratum 298), are valid. However, if host processor's >>> + * osvw_len is 0 then osvw_status[0] carries no information. We need to >>> + * be conservative here and therefore we tell the guest that erratum > 298 >>> + * is present (because we really don't know). >>> + */ >>> + if (osvw_length == 0&& boot_cpu_data.x86 == 0x10) >> >> Why do you check the family here? If osvw_length can only ever be >> zero on Fam10, then the first check is sufficient. If osvw_length can >> be zero on other than Fam10 (no matter whether we bail early older >> CPUs), then the check is actually wrong. > > 10h is the only family affected by this erratum. What is "this erratum" here? My comment was to suggest that either of the two conditions can likely be eliminated for being redundant. >>> --- a/xen/arch/x86/traps.c Mon Jan 09 16:01:44 2012 +0100 >>> +++ b/xen/arch/x86/traps.c Mon Jan 16 22:08:09 2012 +0100 >>> @@ -2542,6 +2542,15 @@ static int emulate_privileged_op(struct >>> if ( wrmsr_safe(regs->ecx, msr_content) != 0 ) >>> goto fail; >>> break; >>> + case MSR_AMD_OSVW_ID_LENGTH: >>> + case MSR_AMD_OSVW_STATUS: >>> + if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) { >>> + if (!boot_cpu_has(X86_FEATURE_OSVW)) >>> + goto fail; >>> + else >> >> Bogus "else" after a "goto". And I wonder, provided there is some >> point in doing all this for PV guests in the first place, whether this >> shouldn't be kept getting handled by the default case. > > > If we go into default case we will emit "Domain attempted WRMSR ..." > message in the log. I was trying to avoid that since writing to these > registers is not really an error, it just gets ignored. The documentation says that both MSRs are read/write, so issuing a message here appears to be The Right Thing (tm). Kernels shouldn't be doing this in general, and for PV kernels imo it is definitely a bug. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.