Xen project Mailing List

Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining

From: SUZUKI Kazuhiro <kaz@xxxxxxxxxxxxxx>

Date: Thu, 25 Dec 2008 15:06:51 +0900 (JST)

Cc: yunhong.jiang@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 24 Dec 2008 22:07:27 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Wang, Thank you very much for your comments. I will post an updated patch later. Thanks, KAZ From: "Wang, Shane" <shane.wang@xxxxxxxxx> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining Date: Mon, 22 Dec 2008 09:52:28 +0800 > Hi KAZ, > > For your latest patch, I am thinking of the following code. > > + /* check whether a page number have been already registered or not */ > + list_for_each_entry(page, &page_offlining, list) { > + if (page == pg) > + goto out; > + } > > x86_page_offlining_t e is put into the list. Here, is it "if (e->page == pg)"? > > + > + /* check whether already having attribute 'reserved' */ > + if (pg->count_info & PGC_reserved) { > + printk(XENLOG_DEBUG "Page offlining: ( %lx ) failure.\n", > + maddr); > + return 1; > + } > + > + /* add attribute 'reserved' and register the page */ > + get_page(pg, d); > + pg->count_info |= PGC_reserved; > + > + e = xmalloc(x86_page_offlining_t); > + BUG_ON(e == NULL); > + e->page = pg; > + list_add(&e->list, &page_offlining); > > Since page struct is also a list entry (containing "struct list_head list"). > You can pick off from domain page list and put it onto your page_offlining > list. > Certainly, the domain may be using it now. You can mark it by using your flag > PGC_reserved first and then do this in the free function. I think this is > what Jiang Yunhong suggested:) > What do you think on those suggestions? PS: in that case, > alloc_bitmap[]/heap[][][]/avail[] should be considered. > > For the code piece > > + if ( !list_empty(&heap(node, zone, j)) ) { > + pg = list_entry(heap(node, zone, j).next, struct > page_info, list); > + if (!(pg->count_info & PGC_reserved)) > + goto found; > + else > + printk(XENLOG_DEBUG "Page %p(%lx) is not to be > allocated.\n", > + pg, page_to_maddr(pg)); > + > > It seems this mechanism is not efficient enough since the first page is > marked PGC_reserved (but the rest pages in heap(node, zone, j).next are not, > the rest pages can't be used and allocated any more although they are not > offlined), and it didn't solve the potential bug I mentioned. > Moreover, since the original code (i.e. without your patch) just tries to > find a space with an enough order, it just gets the first finding. So it is > enough. However, I am wondering why you don't try to enumerate heap(node, > zone, j) to get the second one ... but directly return "not to be allocated", > when the code finds the first list entry can't meet the requirement? > > Thanks. > Shane > > -----Original Message----- > From: SUZUKI Kazuhiro [mailto:kaz@xxxxxxxxxxxxxx] > Sent: 2008年12月22日 8:42 > To: Jiang, Yunhong > Cc: Wang, Shane; xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] [PATCH 0/2] MCA support with page offlining > > Hi, > > > But I'm a bit confused some changes in this patch. > > Sorry, I attached a wrong patch, this is a correct one. > > Thanks, > KAZ > > Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx> > > > From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > Date: Fri, 19 Dec 2008 18:48:11 +0800 > > > SUZUKI Kazuhiro <mailto:kaz@xxxxxxxxxxxxxx> wrote: > > > Hi Yunhong and Shane, > > > > > > Thank you for your comments and reviews. > > > > > >> For page offlining, It may not be a good way to put the page into an > > >> array > > >> but a list. > > > > > > I modified to register offlined pages to a list instead of an array. > > > > > >> I guess the code targets for offlining domain pages only, > > > right? How about free pages and xen pages? > > >> If so, no need to check in the following when allocating > > > free pages, since the offlined pages will not be freed into heap()()(). > > >> If not, the following may have a bug. > > > > > > Yes, I assumed that offlining page was needed for domain pages. If xen > > > pages are impacted, then it is enough to crash the Hypervisor, in current > > > implementation. > > > > We have a internal patch for the similar purpose, for page offline caused > > by both #MC or other purpose. We can base our work on your patch if needed. > > > > But I'm a bit confused some changes in this patch. > > In your previous version, if a page is marked as PGC_reserved, it will not > > be allocated anymore. However, in this version, when a page is marked as > > PGC_reserved, it is just not freed, so does that mean the page will not be > > removed from current list? That seems a bit hack for me. > > > > What I think to do is for page offline is: > > a) mark a page as PGC_reserved (or other name like PGC_broken ) > > b) If it is free, to step c, otherwise, wait till it is freed by the owner > > and to step c. > > c) Remove the page from the buddy system, motve it to a special and > > seperated list (i.e. not in the heap[][][] anymore), and return other page > > to the buddy allocator. > > > > Some argument is in step b, that if the page is owned by a guest, we can > > replace it with a new page through p2m table, and don't need wait till it > > is freed, we didn't do that currently because it is a bit complex to > > achieve that. > > > > How is your idea on this? > > > > Thanks > > Yunhong Jiang > > > > > > > > I attach an updated patch for xen part which also includes some bug fixes. > > > > > > Thanks, > > > KAZ > > > > > > Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx> > > > > > > > > > From: "Wang, Shane" <shane.wang@xxxxxxxxx> > > > Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining > > > Date: Tue, 16 Dec 2008 19:10:00 +0800 > > > > > >> For page offlining, It may not be a good way to put the page into an > > >> array > > >> but a list. > > >> > > >> + pg->count_info |= PGC_reserved; > > >> + page_offlining[num_page_offlining++] = pg; > > >> > > >> I guess the code targets for offlining domain pages only, > > > right? How about free pages and xen pages? > > >> If so, no need to check in the following when allocating > > > free pages, since the offlined pages will not be freed into heap()()(). > > >> If not, the following may have a bug. > > >> > > >> + if ( !list_empty(&heap(node, zone, j)) ) { > > >> + pg = list_entry(heap(node, zone, > > > j).next, struct page_info, list); > > >> + if (!(pg->count_info & PGC_reserved)) > > >> + goto found; > > >> + else > > >> + printk(XENLOG_DEBUG "Page %p(%lx) is not to be > > >> allocated.\n", + pg, page_to_maddr(pg)); + > > >> > > >> } > > >> > > >> If one free page (not pg) within pg and pg+(1U<<j) is > > > offlined, the range pg~pg+(1U<<j) has the risk to be allocated > > > with that page. > > >> > > >> Shane > > >> > > >> Jiang, Yunhong wrote: > > >>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote: > > >>>> Hi all, > > >>>> > > >>>> I had posted about MCA support for Intel64 before. It had only a > > >>>> function to log the MCA error data received from hypervisor. > > >>>> > > >>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 > > >>>> 0876.html > > >>>> > > >>>> I attach patches that support not only error logging but also Page > > >>>> Offlining function. The page where an MCA occurs will offline and not > > >>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark > > >>>> the impacted page. > > >>>> > > >>>> I know that it is better to implement the page offlining for general > > >>>> purpose, but I implemented for MCA specialized in this first step. > > >>> > > >>> Maybe the MCA page offline is a bit different to normal page offline > > >>> requirement, so take it as first step maybe a good choice :) > > >>> > > >>> As for your current page_offlining, I'm not sure why the PGC_reserved > > >>> page should not be freed? Also, for following code, will that make > > >>> the heap(node, zone, j) can't be allocated anymore? Maybe we can > > >>> creat a special list to hold all those pages and remove them from the > > >>> heap list? > > >>> > > >>> + if ( !list_empty(&heap(node, zone, j)) ) { > > >>> + pg = list_entry(heap(node, zone, j).next, struct > > >>> page_info, list); + if (!(pg->count_info & > > >>> PGC_reserved)) + goto found; + > > >>> else + printk(XENLOG_DEBUG "Page %p(%lx) is not > > >>> to > > >>> be allocated.\n", + pg, > > >>> page_to_maddr(pg)); > > >>> + > > >>> > > >>> > > >>>> > > >>>> And I also implement the MCA handler of Dom0 which support to > > >>>> shutdown the remote domain where a MCA occurred. If the MCA occurred > > >>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed, > > >>>> Dom0 calls SCHEDOP_remote_shutdown hypercall. > > >>>> > > >>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch > > >>> > > >>> We are not sure we really need pass all #MC information to dom0 > > >>> firstly, and let dom0 to notify domU. Xen should knows about > > >>> everything, so it may have knowledge to decide inject virtual #MC to > > >>> guest or not. Of course, this does not impact your patch. > > >>> > > >>>> [2/2] linux/x86_64 part: > > > mca-support-with-page-offlining-linux.patch > > >>> > > >>> As for how to inject virtual #MC to guest (including dom0), I think > > >>> we need consider following point: > > >>> > > >>> a) Benefit from reusing guest #MC handler's . #MC handler is well > > >>> known difficult to test, and the native guest handler may have been > > >>> tested more widely. Also #MC handler improves as time going-on, reuse > > >>> guest's MCA handler share us those improvement. > > >>> b) Maintain the PV handler to different OS version may not so easy, > > >>> especially as hardware improves, and kernel may have better support > > >>> for error handling/containment. > > >>> c) #MC handler may need some model specific information to decide the > > >>> action, while guest (not dom0) has virtualized CPUID information. > > >>> d) Guest's MCA handler may requires the physical information when the > > >>> #MC hapen, like the CPU number the #MC happens. > > >>> e) For HVM domain, PV handler will be difficult (considering Windows > > >>> guest). > > >>> > > >>> And we have several option to support virtual #MC to guest: > > >>> > > >>> Option 1 is what currently implemented. A PV #MC handler is > > >>> implemented in guest. This PV handler gets MCA information from Xen > > >>> HV through hypercall, including MCA MSR value, also some additional > > >>> information, like which physical CPU the MCA happened. Option 1 will > > >>> help us on issue d), but we need main a PV handler, and can't get > > >>> benifit from native handler. Also it does not resolve issue c) quite > > >>> well. > > >>> > > >>> option 2, Xen will provide MCA MSR virtualization so that guest's > > >>> native #MC handler can run without changes. It can benifit from guest > > >>> #MC handler, but it will be difficult to get model specific > > >>> information, and has no physical information. > > >>> > > >>> Option 3 uses a PV #MC handler for guest as option 1, but interface > > >>> between Xen/guest is abstract event, like offline offending page, > > >>> terminate current execution context etc. This should be straight > > >>> forward for Linux, but may be difficult to Windows and other OS. > > >>> > > >>> Currently we are considering option 2 to provide MCA MSR > > >>> virtualization to guest, and dom0 can also benifit from such support > > >>> (if guest has different CPUID as native, we will either keep guest > > >>> running, or kill guest based on error code). Of course, current > > >>> mechanism of passing MCA information from xen to dom0 will still be > > >>> useful, but that will be used for logging purpose or for Correcatable > > >>> Error. How do you think about this? > > >>> > > >>> Thanks > > >>> Yunhong Jiang > > >>> > > >>>> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx> > > >>>> > > >>>> Thanks, > > >>>> KAZ > > >>>> > > >>>> _______________________________________________ > > >>>> Xen-devel mailing list > > >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx > > >>>> http://lists.xensource.com/xen-devel > > >>> _______________________________________________ > > >>> Xen-devel mailing list > > >>> Xen-devel@xxxxxxxxxxxxxxxxxxx > > >>> http://lists.xensource.com/xen-devel > > >> > > >> > > >> _______________________________________________ > > >> Xen-devel mailing list > > >> Xen-devel@xxxxxxxxxxxxxxxxxxx > > >> http://lists.xensource.com/xen-devel

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.