|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 08/12] x86/vmce: enable injecting LMCE to guest on Intel host
On 03/20/17 10:25 -0600, Jan Beulich wrote:
> >>> On 17.03.17 at 07:46, <haozhong.zhang@xxxxxxxxx> wrote:
> > @@ -88,18 +89,31 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
> > goto vmce_failed;
> > }
> >
> > - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
> > - global->mc_vcpuid == XEN_MC_VCPUID_INVALID)
> > + mc_vcpuid = global->mc_vcpuid;
> > + if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
> > + (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> > + (!(global->mc_gstatus & MCG_STATUS_LMCE) ||
> > + !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled) ||
> > + /*
> > + * The following check serves for MCE injection
> > + * test, i.e. xen-mceinj. xen-mceinj may specify
> > + * the target domain (i.e. bank->mc_domid) and
> > + * target CPU, but it's hard for xen-mceinj to
> > + * ensure, when Xen prepares the actual
> > + * injection in this function, vCPU currently
> > + * running on the target CPU belongs to the
> > + * target domain. If such inconsistency does
> > + * happen, fallback to broadcast.
> > + */
> > + global->mc_domid != bank->mc_domid)))
>
> Thinking about this another time, I don't think we want hackery
> like this for a test utility. Instead I think the test utility wants to
> pin the vCPU on the pCPU it wants to deliver the LMCE on.
>
I agree we should not introduce hackery only for test purpose.
However, after thinking twice, I think we still need this check, but
it should be lift to the outmost, i.e.
if (mc_vcpuid == XEN_MC_VCPUID_INVALID ||
global->mc_domid != bank->mc_domid || <== here
(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
(!(global->mc_gstatus & MCG_STATUS_LMCE) ||
!(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled))
MC# might not happen immediately at the moment that, e.g., the bad
memory cell is accessed, so the current domain id and vcpu id recorded
in global->mc_{domid, vcpuid} by mca_init_global() are probably not
precise (e.g. the domain accessed the bad memory was scheduled out,
and MC# comes while another domain is running). If such imprecision
does happen when handling Intel LMCE or AMD MCE, we cannot figure out
in mc_memerr_dhandler() (though it's not called in the current AMD MCE
handling, it intended to be the common code) the exact vcpu that
is affected.
To be worse, if the imprecise global->mc_vcpuid (whose value is in
variable mc_vcpuid) is larger than the maximum vcpu id of the affected
domain (indicated by variable 'd'), the check
!(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled)
is definitely wrong.
Haozhong
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |