[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 08/12] x86/vmce: enable injecting LMCE to guest on Intel host
On 03/20/17 10:25 -0600, Jan Beulich wrote: > >>> On 17.03.17 at 07:46, <haozhong.zhang@xxxxxxxxx> wrote: > > @@ -88,18 +89,31 @@ mc_memerr_dhandler(struct mca_binfo *binfo, > > goto vmce_failed; > > } > > > > - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL || > > - global->mc_vcpuid == XEN_MC_VCPUID_INVALID) > > + mc_vcpuid = global->mc_vcpuid; > > + if (mc_vcpuid == XEN_MC_VCPUID_INVALID || > > + (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && > > + (!(global->mc_gstatus & MCG_STATUS_LMCE) || > > + !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled) || > > + /* > > + * The following check serves for MCE injection > > + * test, i.e. xen-mceinj. xen-mceinj may specify > > + * the target domain (i.e. bank->mc_domid) and > > + * target CPU, but it's hard for xen-mceinj to > > + * ensure, when Xen prepares the actual > > + * injection in this function, vCPU currently > > + * running on the target CPU belongs to the > > + * target domain. If such inconsistency does > > + * happen, fallback to broadcast. > > + */ > > + global->mc_domid != bank->mc_domid))) > > Thinking about this another time, I don't think we want hackery > like this for a test utility. Instead I think the test utility wants to > pin the vCPU on the pCPU it wants to deliver the LMCE on. > I agree we should not introduce hackery only for test purpose. However, after thinking twice, I think we still need this check, but it should be lift to the outmost, i.e. if (mc_vcpuid == XEN_MC_VCPUID_INVALID || global->mc_domid != bank->mc_domid || <== here (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (!(global->mc_gstatus & MCG_STATUS_LMCE) || !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled)) MC# might not happen immediately at the moment that, e.g., the bad memory cell is accessed, so the current domain id and vcpu id recorded in global->mc_{domid, vcpuid} by mca_init_global() are probably not precise (e.g. the domain accessed the bad memory was scheduled out, and MC# comes while another domain is running). If such imprecision does happen when handling Intel LMCE or AMD MCE, we cannot figure out in mc_memerr_dhandler() (though it's not called in the current AMD MCE handling, it intended to be the common code) the exact vcpu that is affected. To be worse, if the imprecise global->mc_vcpuid (whose value is in variable mc_vcpuid) is larger than the maximum vcpu id of the affected domain (indicated by variable 'd'), the check !(d->vcpu[mc_vcpuid]->arch.vmce.lmce_enabled) is definitely wrong. Haozhong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |