[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/vpmu_intel: Fix hypervisor crash by catching wrmsr fault



>>> On 20.04.17 at 19:49, <mohit.gambhir@xxxxxxxxxx> wrote:
> This patch changes wrmsrl() calls to write to MSR_P6_EVTSEL register in the
> VPMU to wrmsr_safe(). There are known (and possibly some unknown) cases where
> setting certain bits in MSR_P6_EVTSEL reg. can cause a General Protection
> fault on some machines. Unless we catch this fault when it happens, it will
> result in a hypervisor crash.
> 
> For instance, setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results
> in a General Protection Fault on Broadwell machines and thus causes the
> hypervisor to crash.

I can't seem to find indication of this in the SDM. Is this removal of a
valid bit from an architectural MSR documented anywhere?

> @@ -356,7 +356,9 @@ static inline void __core2_vpmu_load(struct vcpu *v)
>      for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
>          wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
> -        wrmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
> +        if ( wrmsr_safe(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control) ==
> +                -EFAULT)
> +            return -EFAULT;

If the function already returns an error code, please use it to avoid
a latent bug when in the future it may also return other than -EFAULT.

> @@ -369,6 +371,7 @@ static inline void __core2_vpmu_load(struct vcpu *v)
>          core2_vpmu_cxt->global_ovf_ctrl = 0;
>          wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
>      }
> +    return 0;

Blank line above the main return statement of a function please.

> @@ -461,9 +464,8 @@ static int core2_vpmu_load(struct vcpu *v, bool_t 
> from_guest)
>  
>      vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
>  
> -    __core2_vpmu_load(v);
> +    return __core2_vpmu_load(v);
>  
> -    return 0;
>  }

Please also remove the now stray blank line.

> @@ -538,7 +540,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int 
> *type, int *index)
>      /* Do the lazy load staff. */
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>      {
> -        __core2_vpmu_load(current);
> +        if ( __core2_vpmu_load(current) )
> +            return 0;

Nothing you need to change, but the change here point out that
the function really needs switching to returning bool.

> @@ -719,8 +722,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, 
> uint64_t msr_content,
>          }
>      }
>  
> -    if ( type != MSR_TYPE_GLOBAL )
> -        wrmsrl(msr, msr_content);
> +    if ( type != MSR_TYPE_GLOBAL)
> +    {
> +        if ( wrmsr_safe(msr, msr_content) == -EFAULT )
> +            return -EFAULT;

See the earlier similar comment.

Further a more general question: Why do you do this only for
MSR_P6_EVNTSEL*, but not consistently for all MSRs which are
being written with (partially) guest controlled values? I'd namely
expect all wrmsrl() to be switched over, unless it is clear that
we're dealing with an erratum here and we therefore can
assume that normal guarantees for architectural MSRs apply to
everything else.

And finally, in the description you talk about forwarding the
fault - I can see this being the case for core2_vpmu_do_wrmsr(),
but how does this match up with __core2_vpmu_load() and its
callers? Namely you may report an error there after already
having updated some MSRs. Leaving the overall set of MSRs in
an inconsistent state is not very likely to be a good idea.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.