[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen/MCE: adjust for future new vMCE model



>>> On 05.07.12 at 11:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> Jan Beulich wrote:
>>>>> On 05.07.12 at 05:30, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>> wrote: @@ -68,7 +74,7 @@ 
>>> 
>>>  int vmce_restore_vcpu(struct vcpu *v, uint64_t caps)  {
>>> -    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
>>> +    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT )
>>>      {
>>>          dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA
>>>                  capabilities" " %#" PRIx64 " for d%d:v%u
>>>          (supported: %#Lx)\n", @@ -77,6 +83,12 @@ return -EPERM;
>>>      }
>>> 
>>> +    if ( (caps & MCG_CAP_COUNT) != GUEST_BANK_NUM ) +    {
>>> +        dprintk(XENLOG_G_ERR, "Bank number inconsistency\n"); +    
>>> return -EPERM; +    }
>>> +
>>>      v->arch.mcg_cap = caps;
>>>      return 0;
>>>  }
>> 
>> These two changes, as pointed out before, need some further
>> thought and tweaking: As I said earlier, we are already shipping
>> the code in its prior form, so outright rejecting MCG_CTL_P set
>> and non-default bank counts is not a proper solution. Warning
>> about them being in an unsupported state is certainly acceptable.
>> 
>> And I think the guest visible MCG_CAP value also shouldn't
>> change across migration, so tolerating (but ignoring) a higher
>> bank count seems necessary. Not sure what to do when the
>> bank count is lower (0 or 1) - for 1, all communication to the
>> guest should probably go through bank 0, while 0 should
>> probably disable vMCE  for that vCPU.
>> 
>> Further I just noticed that you don't touch fill_vmsr_data() at
>> all (sending patches created with diff's -p option or equivalent
>> helps spotting where individual changes belong), yet that
>> function uses the hardware bank number to fill the struct
>> bank_entry. With the intended concept, the "bank" member
>> of that structure can probably go away altogether.
>> 
>> Jan
> 
> 
> Seems things become a little bit chaos, let's jump out from details, make 
> agreement first about what's the general purpose of this middle-work patch.
> 
> ============
> 1. current xen vmce status
>   1.1) current xen vmce has 2 kinds of bugs:
>     * functional bug: it actually doesn't work correctly for guest;
>     * migration bug: partly solved by c/s 24887;
>   1.2) c/s 24887 not in formal xen release version, it's a temporary fix 
> (but unfortunately has been backported to SELS11 sp2)
> ============
> 2. target of this middle-work patch
>   2.1) it's not used to address functional bug
>   2.3) it does minimal work, just in order not to bring trouble/dirty to 
> future new vMCE, so it would
>     * remove MCG_CTL --> otherwise new vMCE have to add useless MCG_CTL_P and 
> MCG_CTL, w/ potential model specific issue;
>     * stick all 1's to MCi_CTL --> to avoid semantic issue;
>     * set MCG_CTL_P=0 and 2 banks at MCG_CAP --> otherwise dirty code at new 
> vMCE, like bank number issue;
> ============
> 
> I think in this middle-work patch, we don't need constrained by c/s 24887:
> 1. c/s 24887 not in formal xen release
> 2. the benefit of keep compatible w/ 24887 is minor:
>   * currently all xen version have migration bug, 3.x -> 4.0 -> 4.1 -> 4.2
>   * keep compatible w/ 24887 only benefit one case --> migration from SLES11 
> sp2 to xen 4.2 (for both SLES11 sp2 and xen 4.2, vMCE actually doesn't 
> functionally work fine to guest)
> 3. the price/hurt to future vMCE is high (to keep compatible w/ 24887)

Why do you consider the price high? I think it would require pretty
minor changes.

> Considering above, I prefer an outright cleanup in xen4.2, not constrained 
> by c/s 24887 and not bring trouble/dirty to future vMCE.

Whether or not is was appropriate for us to backport this change
early is unclear, but given that back at that time I had already
pointed out the problems and asked for work to be done to clean
it up, and given that it has taken over four months to get going
with this, I don't think you would suggest the alternative of us
having left the bug unfixed for this entire time period.

So unless the price to pay is unreasonably high, I'd favor getting
this taken care of rather than ignored.

If I can find time to work on your last version of the patch towards
the direction I have in mind tomorrow, I'll do so; I'll be gone for
two weeks afterwards (and be mostly invisible for another one),
so wouldn't be able to look at this again presumably until the
beginning of August.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.