[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [RFC][PATCH 0/2] MCA support for Intel64

SUZUKI Kazuhiro <mailto:kaz@xxxxxxxxxxxxxx> wrote:
> Hi,
> Thank you for your comment and information.
> But, I wonder where to start about these issues.
>> BTW, I checked you patch to kernel of mce-xen.c, seems not
> much change to it. You will add more changes on it, right?
> Yes, I plan to implement memory offlining, if the impacted physical
> memory is detected by MCA, it will be offlined and never reused. Then
> Xen notifies it to Dom0/DomU. Dom0/DomU's handler should kill such
> a process or do something.

Sorry for slow response because PRC holiday last week.
What I mean is, dom0's MCA handler need more changes. For example, it need 
check who is the owner of the memory.

As for memory offlineing, we are also discussing it internally, and we can work 
together to add the support. Followed are some idea, hope your feedback on it:
1) memory offlining can be used not only for MCA, but also for other purpose, 
like memory PM, memory hot add/remove etc, so maybe we can implement it as a 
generic feature (although I suspect the MCE will be first user)

2) I think there are two types memory offlining requirement for MCA. For 
correctable error that happen to the same page multiple times, it may need to 
offline the page(AFAIK, solaris has such feature). In such situation, the pages 
can still be accessed, and hypervisor can replace the page transparently to 
guest. For non-correctable error (i.e. triggered through MCE#), maybe the page 
can't be accessed any more (like data poisoning situtaion).

3) It may be difficult to offline all type of pages, so  we need category page 
usage type and support some of them.Currently we category page usages as: free 
pages/non critical pages/critical pages. Non critical pages is memory used as 
guest's RAM, ciritcal pages are pages for xen's usage, including xen's 
data/code, and pages used to control guest, like p2m, EPT table etc. It will be 
much simple in frist stage to support only free/non ciritcal pages. Any idea?

4) We need consider device assigend guest also, so that the error will not be 
propgated to permanent storage.

Yunhong Jiang

> Thanks,
> From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
> Subject: RE: [Xen-devel] [RFC][PATCH 0/2] MCA support for Intel64
> Date: Fri, 26 Sep 2008 22:35:55 +0800
>> Glad to know you are working on MCA too.
>> There are some discussion already on it, I google and get
> the link
> http://article.gmane.org/gmane.comp.emulators.xen.devel/56284
> for some discussion. You can check that thread.
>> Some point need discussion including:
>> 1) How to detect the impacted components in Xen side? As the
> above mail thread discussed, maybe some improvement can be
> added to check the context more precisely. For example, if the
> stack tell the MCE happens when Xen is running, even if
> current is idle, we still think Xen is impacted. Also we can
> check the page owner to decide impacted guest. Personally I
> don't think "current" is very helpful.
>> 2) How to split the effort between Xen and Dom0? Maybe we
> can re-use dom0's MCE handler as much as possible, Xen will
> only do some initial containment, especially considering Linux
> is also enhancing MCE handler. Of course, another option is to
> place all handler in Xen side.
>> 3) We need consider what will happen if multiple MCA happen
> at multiple CPU simuatanously, maybe some monarch algrithom is needed.
>> Hope your input on it.
>> BTW, I checked you patch to kernel of mce-xen.c, seems not
> much change to it. You will add more changes on it, right?
>> Thanks
>> Yunhong Jiang
>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>>> Hi,
>>> I am interested in MCA/MCE on x86 system. But I don't have AMD
>>> machine, so I tried to port MCA handler on AMD k8 to Intel p4, I
>>> attach it. Additionally, I attach a patch that supports MCA handler for
>>> linux-2.6.18-xen/x86_64.
>>> Unfortunately, I don't know how to test MCA/MCE on Intel p4 CPU and
>>> chipset, so I could not test with real MCA only in testing with fake
>>> mode(just function calling). Is there anyone who knows and teaches me how
>>> to test the real MCA?
>>> [1/2] xen part: mca-support-for-intel-xen.patch
>>> [2/2] linux/x86_64 part: mca-support-for-intel-linux.patch
>>> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
>>> Thanks,
>>> KAZ
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.