Xen project Mailing List

RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining

To: SUZUKI Kazuhiro <kaz@xxxxxxxxxxxxxx>, "Wang, Shane" <shane.wang@xxxxxxxxx>

From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>

Date: Fri, 19 Dec 2008 18:48:11 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 19 Dec 2008 02:48:43 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AclhpN8GI96QcMWKRLaimtUAkWhiTQAHSL2w

Thread-topic: [Xen-devel] [PATCH 0/2] MCA support with page offlining

SUZUKI Kazuhiro <mailto:kaz@xxxxxxxxxxxxxx> wrote: > Hi Yunhong and Shane, > > Thank you for your comments and reviews. > >> For page offlining, It may not be a good way to put the page into an array >> but a list. > > I modified to register offlined pages to a list instead of an array. > >> I guess the code targets for offlining domain pages only, > right? How about free pages and xen pages? >> If so, no need to check in the following when allocating > free pages, since the offlined pages will not be freed into heap()()(). >> If not, the following may have a bug. > > Yes, I assumed that offlining page was needed for domain pages. If xen > pages are impacted, then it is enough to crash the Hypervisor, in current > implementation. We have a internal patch for the similar purpose, for page offline caused by both #MC or other purpose. We can base our work on your patch if needed. But I'm a bit confused some changes in this patch. In your previous version, if a page is marked as PGC_reserved, it will not be allocated anymore. However, in this version, when a page is marked as PGC_reserved, it is just not freed, so does that mean the page will not be removed from current list? That seems a bit hack for me. What I think to do is for page offline is: a) mark a page as PGC_reserved (or other name like PGC_broken ) b) If it is free, to step c, otherwise, wait till it is freed by the owner and to step c. c) Remove the page from the buddy system, motve it to a special and seperated list (i.e. not in the heap[][][] anymore), and return other page to the buddy allocator. Some argument is in step b, that if the page is owned by a guest, we can replace it with a new page through p2m table, and don't need wait till it is freed, we didn't do that currently because it is a bit complex to achieve that. How is your idea on this? Thanks Yunhong Jiang > > I attach an updated patch for xen part which also includes some bug fixes. > > Thanks, > KAZ > > Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx> > > > From: "Wang, Shane" <shane.wang@xxxxxxxxx> > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > Date: Tue, 16 Dec 2008 19:10:00 +0800 > >> For page offlining, It may not be a good way to put the page into an array >> but a list. >> >> + pg->count_info |= PGC_reserved; >> + page_offlining[num_page_offlining++] = pg; >> >> I guess the code targets for offlining domain pages only, > right? How about free pages and xen pages? >> If so, no need to check in the following when allocating > free pages, since the offlined pages will not be freed into heap()()(). >> If not, the following may have a bug. >> >> + if ( !list_empty(&heap(node, zone, j)) ) { >> + pg = list_entry(heap(node, zone, > j).next, struct page_info, list); >> + if (!(pg->count_info & PGC_reserved)) >> + goto found; >> + else >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be >> allocated.\n", + pg, page_to_maddr(pg)); + >> } >> >> If one free page (not pg) within pg and pg+(1U<<j) is > offlined, the range pg~pg+(1U<<j) has the risk to be allocated > with that page. >> >> Shane >> >> Jiang, Yunhong wrote: >>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote: >>>> Hi all, >>>> >>>> I had posted about MCA support for Intel64 before. It had only a >>>> function to log the MCA error data received from hypervisor. >>>> >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html >>>> >>>> I attach patches that support not only error logging but also Page >>>> Offlining function. The page where an MCA occurs will offline and not >>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark >>>> the impacted page. >>>> >>>> I know that it is better to implement the page offlining for general >>>> purpose, but I implemented for MCA specialized in this first step. >>> >>> Maybe the MCA page offline is a bit different to normal page offline >>> requirement, so take it as first step maybe a good choice :) >>> >>> As for your current page_offlining, I'm not sure why the PGC_reserved >>> page should not be freed? Also, for following code, will that make >>> the heap(node, zone, j) can't be allocated anymore? Maybe we can >>> creat a special list to hold all those pages and remove them from the >>> heap list? >>> >>> + if ( !list_empty(&heap(node, zone, j)) ) { >>> + pg = list_entry(heap(node, zone, j).next, struct >>> page_info, list); + if (!(pg->count_info & >>> PGC_reserved)) + goto found; + >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not to >>> be allocated.\n", + pg, page_to_maddr(pg)); >>> + >>> >>> >>>> >>>> And I also implement the MCA handler of Dom0 which support to >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall. >>>> >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch >>> >>> We are not sure we really need pass all #MC information to dom0 >>> firstly, and let dom0 to notify domU. Xen should knows about >>> everything, so it may have knowledge to decide inject virtual #MC to >>> guest or not. Of course, this does not impact your patch. >>> >>>> [2/2] linux/x86_64 part: > mca-support-with-page-offlining-linux.patch >>> >>> As for how to inject virtual #MC to guest (including dom0), I think >>> we need consider following point: >>> >>> a) Benefit from reusing guest #MC handler's . #MC handler is well >>> known difficult to test, and the native guest handler may have been >>> tested more widely. Also #MC handler improves as time going-on, reuse >>> guest's MCA handler share us those improvement. >>> b) Maintain the PV handler to different OS version may not so easy, >>> especially as hardware improves, and kernel may have better support >>> for error handling/containment. >>> c) #MC handler may need some model specific information to decide the >>> action, while guest (not dom0) has virtualized CPUID information. >>> d) Guest's MCA handler may requires the physical information when the >>> #MC hapen, like the CPU number the #MC happens. >>> e) For HVM domain, PV handler will be difficult (considering Windows >>> guest). >>> >>> And we have several option to support virtual #MC to guest: >>> >>> Option 1 is what currently implemented. A PV #MC handler is >>> implemented in guest. This PV handler gets MCA information from Xen >>> HV through hypercall, including MCA MSR value, also some additional >>> information, like which physical CPU the MCA happened. Option 1 will >>> help us on issue d), but we need main a PV handler, and can't get >>> benifit from native handler. Also it does not resolve issue c) quite well. >>> >>> option 2, Xen will provide MCA MSR virtualization so that guest's >>> native #MC handler can run without changes. It can benifit from guest >>> #MC handler, but it will be difficult to get model specific >>> information, and has no physical information. >>> >>> Option 3 uses a PV #MC handler for guest as option 1, but interface >>> between Xen/guest is abstract event, like offline offending page, >>> terminate current execution context etc. This should be straight >>> forward for Linux, but may be difficult to Windows and other OS. >>> >>> Currently we are considering option 2 to provide MCA MSR >>> virtualization to guest, and dom0 can also benifit from such support >>> (if guest has different CPUID as native, we will either keep guest >>> running, or kill guest based on error code). Of course, current >>> mechanism of passing MCA information from xen to dom0 will still be >>> useful, but that will be used for logging purpose or for Correcatable >>> Error. How do you think about this? >>> >>> Thanks >>> Yunhong Jiang >>> >>>> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx> >>>> >>>> Thanks, >>>> KAZ >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>> http://lists.xensource.com/xen-devel >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.