|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Xen/MCE: adjust for future new vMCE model
Jan Beulich wrote:
>>>> On 05.07.12 at 11:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Jan Beulich wrote:
>>>>>> On 05.07.12 at 05:30, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>> wrote: @@ -68,7 +74,7 @@
>>>>
>>>> int vmce_restore_vcpu(struct vcpu *v, uint64_t caps) {
>>>> - if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
>>>> + if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT )
>>>> {
>>>> dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA
>>>> capabilities" " %#" PRIx64 " for d%d:v%u
>>>> (supported: %#Lx)\n", @@ -77,6 +83,12 @@ return -EPERM;
>>>> }
>>>>
>>>> + if ( (caps & MCG_CAP_COUNT) != GUEST_BANK_NUM ) + {
>>>> + dprintk(XENLOG_G_ERR, "Bank number inconsistency\n"); +
>>>> return -EPERM; + } +
>>>> v->arch.mcg_cap = caps;
>>>> return 0;
>>>> }
>>>
>>> These two changes, as pointed out before, need some further
>>> thought and tweaking: As I said earlier, we are already shipping
>>> the code in its prior form, so outright rejecting MCG_CTL_P set
>>> and non-default bank counts is not a proper solution. Warning
>>> about them being in an unsupported state is certainly acceptable.
>>>
>>> And I think the guest visible MCG_CAP value also shouldn't
>>> change across migration, so tolerating (but ignoring) a higher
>>> bank count seems necessary. Not sure what to do when the
>>> bank count is lower (0 or 1) - for 1, all communication to the
>>> guest should probably go through bank 0, while 0 should
>>> probably disable vMCE for that vCPU.
>>>
>>> Further I just noticed that you don't touch fill_vmsr_data() at
>>> all (sending patches created with diff's -p option or equivalent
>>> helps spotting where individual changes belong), yet that
>>> function uses the hardware bank number to fill the struct
>>> bank_entry. With the intended concept, the "bank" member
>>> of that structure can probably go away altogether.
>>>
>>> Jan
>>
>>
>> Seems things become a little bit chaos, let's jump out from details,
>> make agreement first about what's the general purpose of this
>> middle-work patch.
>>
>> ============
>> 1. current xen vmce status
>> 1.1) current xen vmce has 2 kinds of bugs:
>> * functional bug: it actually doesn't work correctly for guest;
>> * migration bug: partly solved by c/s 24887;
>> 1.2) c/s 24887 not in formal xen release version, it's a temporary
>> fix (but unfortunately has been backported to SELS11 sp2)
>> ============
>> 2. target of this middle-work patch
>> 2.1) it's not used to address functional bug
>> 2.3) it does minimal work, just in order not to bring
>> trouble/dirty to future new vMCE, so it would * remove MCG_CTL
>> --> otherwise new vMCE have to add useless MCG_CTL_P and MCG_CTL, w/
>> potential model specific issue;
>> * stick all 1's to MCi_CTL --> to avoid semantic issue;
>> * set MCG_CTL_P=0 and 2 banks at MCG_CAP --> otherwise dirty
>> code at new vMCE, like bank number issue; ============
>>
>> I think in this middle-work patch, we don't need constrained by c/s
>> 24887:
>> 1. c/s 24887 not in formal xen release
>> 2. the benefit of keep compatible w/ 24887 is minor:
>> * currently all xen version have migration bug, 3.x -> 4.0 -> 4.1
>> -> 4.2
>> * keep compatible w/ 24887 only benefit one case --> migration
>> from SLES11 sp2 to xen 4.2 (for both SLES11 sp2 and xen 4.2, vMCE
>> actually doesn't functionally work fine to guest)
>> 3. the price/hurt to future vMCE is high (to keep compatible w/
>> 24887)
>
> Why do you consider the price high? I think it would require pretty
> minor changes.
No, what I meant is not coding price. I mean the price that have to add dirty
and useless change to the new vMCE is high. Just curious why we cannot simply
get rid of c/s 24887? after all it only benefit SLES11 sp2.
>
>> Considering above, I prefer an outright cleanup in xen4.2, not
>> constrained by c/s 24887 and not bring trouble/dirty to future vMCE.
>
> Whether or not is was appropriate for us to backport this change
> early is unclear, but given that back at that time I had already
> pointed out the problems and asked for work to be done to clean
> it up, and given that it has taken over four months to get going
> with this, I don't think you would suggest the alternative of us
> having left the bug unfixed for this entire time period.
Jan, I'm not challenge why you backport to SLES11 sp2. If there is anything
wrong, I agree it's my fault. Currently what I concern is if we can do an
outright cleanup at xen4.2 so that future vMCE could be simple and clean.
>
> So unless the price to pay is unreasonably high, I'd favor getting
> this taken care of rather than ignored.
If so, why we need this middle-work patch? It could just keep current status at
xen 4.2, then start 'dirty' new vMCE implement at xen4.3 --> and the 'dirty'
inherit from c/s 24887 which from code point of view would be totally removed
at new vMCE.
Thanks,
Jinsong
>
> If I can find time to work on your last version of the patch towards
> the direction I have in mind tomorrow, I'll do so; I'll be gone for
> two weeks afterwards (and be mostly invisible for another one),
> so wouldn't be able to look at this again presumably until the
> beginning of August.
>
> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |