[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 8 Mar 2021 13:11:24 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ez0HUnktigCDQNibsVl6hn7fBtBNUbK8UenV0QrkbLU=; b=Bh6b3U7AtAjWCoRyNcxpUcopYSnGFdDE0hW10eqLrBh8Yg7HtRBJAyBESk80D8EO95bbqwONQv8gJ+5cwB9RVc5k8X2c3KJDs//V8iw67JnDsiq37aYw9BliJu5q1TmhN6aPD4669cmc9P84zaCs2/DvYAdu5pjhWSN2Woo3jnWMkpk3MfJVtX95fJNyeGXh4VT5GyRbgRrPSXT+NIBzE/eyRwUBozUyrc3l16NDYj2P3OY/O0fwoGMTS/6GyYNYzJwT1S/7kZWAe+XezPISEaojxwqdSiKfXEoW/4JKlDdCSH+cZLF+dzZmX3E2aYVeajcD7dR9mAUk7WkflAwWRA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lK5ZANy62KuvDIHOVQ8azEL051IBIekthbu4pmuGSmYe1x/Rn6nm5ckxpkLz+Bt8gKp5lV73Y1WBi9t+A7/nHDL5inXUzzjUAYDWOe7oYr8mPmgGjSOEY+KDnwbxinM5/wM5sKjBNzvmNZOvt8AXctOzNr8HLYq2atERjaSN7fenOITT34Z18kf0uM3PXyNZ8XfCZNPZg13BIyw02aym4ZJwMumlILm2HX8JVYRXt/m9sRPBFux+U2TIeER51hiuSfRu7Itp225/Lqrm/3s8Otl56SvmjlarkZw+DWKke77gjd8urRson+/kdh9N2/nB0R+925BkFY6AidGujXOnCA==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Mon, 08 Mar 2021 12:12:18 +0000
  • Ironport-sdr: YGsrTk6/ewiDiJ9DFKPlh3Sg1xST+M/ZGGXi8yu5SUTBaRGvHZk4yA/vxWifWN7t49IitxP42n mt5+JcXyCiOxTKu1lChIlx+pG/UBKtyXQaJH/RPnVvqR8o8qilCVTCAO9hsg5QPgceX1Jpz9ob KqcORo0dWngMUmi4RJRkUC7VnHul/y67ozpRgK5nvGefO7XUKXs51bY5xhCPDhmBADgPy/OO6K RcqwBDnN5IPUp8l3ZVs+cyKSaaSkNLg/YNNuTyY/d8P4RpeIbmRJemY/xr8pwkhpNJ5tHMooAi XeY=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> On 08.03.2021 09:56, Roger Pau Monné wrote:
> > On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> >> handler early enough to cover for example the rdmsrl_safe() of
> >> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> >> of MSR_K7_HWCR later in the same function). The respective change
> >> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> >> backported to 4.14, but no further - presumably since it wasn't really
> >> easy because of other dependencies.
> >>
> >> Therefore, to prevent our change in the handling of guest MSR accesses
> >> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> >> the raising of #GP on this paths conditional upon the guest having
> >> installed a handler, provided of course the MSR can be read in the first
> >> place (we would have raised #GP in that case even before). Producing
> >> zero for reads isn't necessarily correct and may trip code trying to
> >> detect presence of MSRs early, but since such detection logic won't work
> >> without a #GP handler anyway, this ought to be a fair workaround.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> ---
> >> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
> >>     messages (in debug builds). Don't alter WRMSR behavior.
> >> ---
> >> I'm not convinced we can get away without also making the WRMSR path
> >> somewhat more permissive again, e.g. tolerating attempts to set bits
> >> which are already set. But of course this would require keeping in sync
> >> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
> >> may also appear as an attempt to clear others (because of the zero value
> >> that we gave it for the read).
> > 
> > The above approach seems dangerous, as it could allow a guest to
> > figure out the value of the underlying MSR by probing whether values
> > trigger a #GP?
> 
> Perhaps, yes. But what do you do? There's potentially a huge value
> range to probe ...
> 
> > I think we want to do something similar to what we do on HVM in 4.14
> > and previous versions: ignore writes as long as the rdmsr to the
> > target MSR succeeds, regardless of the value.
> 
> Which, as said elsewhere, has its own downsides - writable MSRs don't
> need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
> MSRs.

Yes, but it's IMO the lesser of two evils, I think we should avoid any
kind of behavior that depends on the underlying MSR value, just to be
on the safe side.

> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>      struct vcpu *curr = current;
> >>      const struct domain *currd = curr->domain;
> >>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >> -    bool vpmu_msr = false;
> >> +    bool vpmu_msr = false, warn = false;
> >>      int ret;
> >>  
> >>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>          if ( ret == X86EMUL_EXCEPTION )
> >>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>  
> >> -        return ret;
> >> +        goto done;
> >>      }
> >>  
> >>      switch ( reg )
> >> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>          }
> >>          /* fall through */
> >>      default:
> >> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >> +        warn = true;
> >>          break;
> >>  
> >>      normal:
> >> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>          return X86EMUL_OKAY;
> >>      }
> >>  
> >> -    return X86EMUL_UNHANDLEABLE;
> >> + done:
> > 
> > Won't this handling be better placed in the 'default' switch case
> > above?
> 
> No - see the "goto done" added near the top of the function.

Yes, I'm not sure of that. If guest_rdmsr returns anything different
than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
and hence we shouldn't check whether the #GP handler is set or not.

This is not the behavior of older Xen versions, so I'm unsure whether
we should introduce a policy that's even less strict than the previous
one in regard to whether a #GP is injected or not.

I know injecting a #GP when the handler is not set is not helpful for
the guest, but we should limit the workaround to kind of restoring the
previous behavior for making buggy guests happy, not expanding it
anymore.

> >> +    if ( ret != X86EMUL_OKAY && 
> >> !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
> >> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
> > 
> > We didn't used to care about explicitly blocking the reserved MSR
> > range, do we really wan to do it now?
> > 
> > I'm not sure I see an issue with that, but given that we are trying to
> > bring back something similar to the previous behavior, I would avoid
> > adding new conditions.
> 
> What I'm particularly trying to avoid here is to allow
> information from an underlying hypervisor to "shine through",
> even if it's only information as to whether a certain MSR is
> readable. It should be solely our own selection which MSRs in
> this range a guest is able to (appear to) read.
> 
> Plus of course by avoiding the rdmsr_safe() in this case we
> also avoid the unnecessary (debug only) log message associated
> with such attempted reads.

OK, I think that's fine.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.