[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN



Keir Fraser wrote:
On 16/02/2009 14:18, "Christoph Egger" <Christoph.Egger@xxxxxxx> wrote:

IMO, any design change should be discussed first and not changed
silently, since this will confuse everyone and noone will know
what is the right thing to do in Xen and in Dom0 and this
in turn will lead to error prone, unmaintainable code in both
Xen and in Dom0

I certainly think we should have a shared approach for x86 machine-check
handling, rather than completely different architectures for AMD and Intel.
Fortunately Sun are an interested and active third party regarding this
feature. I'll be interested in their opinion.

 -- Keir


Today is a holiday here in the US, so I have only taken a superficial look at the patches.

However, my initial impression is that I share Christoph's concern. I like the original design, where the hypervisor deals with low-level information collection, passes it on to dom0, which then can make a high-level decision and instructs the hypervisor to take high-level action via a hypercall. The hypervisor does the actual MSR reads and writes, dom0 only acts on the values provided via hypercalls.

We added the physcpuinfo hypercall to stay in this framework: get physical information needed for analysis, but don't access any registers directly.

It seems that these new patches blur this distinction, especially the virtualized msr reads/writes. I am not sure what added value they have, except for being able to run an unmodified MCA handler. However, I think that any active MCA decision making should be centralized, and that centralized place would be dom0. Dom0 is already very much aware of the hypervisor, so I don't see the advantage of having an unmodified MCA handler there (our MCA handlers are virtually unmodified, it's just that the part where the telemetry is collected is inside Xen for the dom0 case).

I also agree that different behavior for AMD and Intel chips would not be good.

Perhaps the Intel folks can explain what the advantages of their approach are, and give some scenarios where there approach would be better? My first impression is that staying within the general framework as provided by Christoph's original work is the better option.

- Frank


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.