[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 09/19] x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case



On 02/17/17 03:21 -0700, Jan Beulich wrote:
> >>> On 17.02.17 at 07:39, <haozhong.zhang@xxxxxxxxx> wrote:
> > --- a/xen/arch/x86/cpu/mcheck/mcaction.c
> > +++ b/xen/arch/x86/cpu/mcheck/mcaction.c
> > @@ -88,21 +88,21 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
> >                      goto vmce_failed;
> >                  }
> >  
> > +                if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> > +                    vmce_vcpuid = VMCE_INJECT_BROADCAST;
> > +                else
> > +                    vmce_vcpuid = global->mc_vcpuid;
> > +
> >                  bank->mc_addr = gfn << PAGE_SHIFT |
> >                    (bank->mc_addr & (PAGE_SIZE -1 ));
> > -                if ( fill_vmsr_data(bank, d,
> > -                      global->mc_gstatus) == -1 )
> > +                if ( fill_vmsr_data(bank, d, global->mc_gstatus,
> > +                                    vmce_vcpuid == VMCE_INJECT_BROADCAST) 
> > == -1 )
> 
> You're changing the return values of function to be proper -E...
> values, so you can't validly compare against -1 here anymore.
>

will fix

> >  int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
> > -                   uint64_t gstatus)
> > +                   uint64_t gstatus, bool broadcast)
> >  {
> >      struct vcpu *v = d->vcpu[0];
> > +    int ret;
> >  
> > -    if ( mc_bank->mc_domid != (uint16_t)~0 )
> > -    {
> > -        if ( v->arch.vmce.mcg_status & MCG_STATUS_MCIP )
> > -        {
> > -            mce_printk(MCE_QUIET, "MCE: guest has not handled previous"
> > -                       " vMCE yet!\n");
> > -            return -1;
> > -        }
> > -
> > -        spin_lock(&v->arch.vmce.lock);
> > +    if ( mc_bank->mc_domid == (uint16_t)~0 )
> > +        return -EINVAL;
> 
> Returning -EINVAL here is a behavioral change which, if intended,
> needs justification in the commit message.
>

It's intended. "ASSERT(d)" in its only caller mc_memerr_dhandler()
implies an invalid domain id should be treated as an error. I'll
mention this in the commit message.

> > -        v->arch.vmce.mcg_status = gstatus;
> > -        /*
> > -         * 1. Skip bank 0 to avoid 'bank 0 quirk' of old processors
> > -         * 2. Filter MCi_STATUS MSCOD model specific error code to guest
> > -         */
> > -        v->arch.vmce.bank[1].mci_status = mc_bank->mc_status &
> > -                                              MCi_STATUS_MSCOD_MASK;
> > -        v->arch.vmce.bank[1].mci_addr = mc_bank->mc_addr;
> > -        v->arch.vmce.bank[1].mci_misc = mc_bank->mc_misc;
> > +    ret = vcpu_fill_mc_msrs(v, gstatus, mc_bank->mc_status,
> > +                            mc_bank->mc_addr, mc_bank->mc_misc);
> > +    if ( ret || !broadcast )
> > +        goto out;
> >  
> > -        spin_unlock(&v->arch.vmce.lock);
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( v == d->vcpu[0] )
> 
> Comparing v->vcpu_id with zero would be a cheaper check as it
> seems.
>

will change

> > +            continue;
> > +        ret = vcpu_fill_mc_msrs(v, MCG_STATUS_MCIP | MCG_STATUS_RIPV,
> > +                                0, 0, 0);
> 
> What guarantees RIP to be valid, and why all the zeros? Perhaps
> I'm lacking some understanding of how real hardware handles the
> broadcasting, but it would help if you left an explaining comment
> here.
>

I just inject the less severity error on vcpus other than vcpu0. As
long as vMCE with the actual error information is injected to vcpu0,
the guest MCE handler should be able to decide whether it can recover
or not. If it can, vMCE injected on other vcpus should not offer
conflict information. If it cannot, MC MSRs on other vcpus will not
matter in fact as they represent a less severe error.

MSR_IA32_MCG_STATUS = MCG_STATUS_MCIP | MCG_STATUS_RIPV and all banks
being zero (invalid) is one of the combinations satisfying above
requirement.

> > +        if ( ret )
> > +            break;
> 
> Wouldn't you better, on a best effort basis, try to update the
> remaining vCPU-s anyway (making sure you don't lose the error
> indicator to be handed back to your caller).
>

sure, will change to update all vcpus

> >      }
> >  
> > -    return 0;
> > + out:
> > +    return ret;
> 
> If at the destination of a goto all you do is return, please avoid
> using goto.
>
will change

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.