[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining



SUZUKI Kazuhiro <mailto:kaz@xxxxxxxxxxxxxx> wrote:
> Hi Yunhong and Shane,
> 
> Thank you for your comments and reviews.
> 
>> For page offlining, It may not be a good way to put the page into an array
>> but a list. 
> 
> I modified to register offlined pages to a list instead of an array.
> 
>> I guess the code targets for offlining domain pages only,
> right? How about free pages and xen pages?
>> If so, no need to check in the following when allocating
> free pages, since the offlined pages will not be freed into heap()()().
>> If not, the following may have a bug.
> 
> Yes, I assumed that offlining page was needed for domain pages. If xen
> pages are impacted, then it is enough to crash the Hypervisor, in current
> implementation. 

We have a internal patch for the similar purpose, for  page offline caused by 
both #MC or other purpose. We can base our work on your patch if needed.

But I'm a bit confused some changes in this patch.
In your previous version, if a page is marked as PGC_reserved, it will not be 
allocated anymore. However, in this version, when a page is marked as 
PGC_reserved, it is just not freed, so does that mean the page will not be 
removed from current list? That seems a bit hack for me.

What I think to do is for page offline is:
a) mark a page as PGC_reserved (or other name like PGC_broken )
b) If it is free, to step c, otherwise, wait till it is freed by the owner and 
to step c.
c) Remove the page from the buddy system, motve it to a special and seperated 
list (i.e. not in the heap[][][] anymore), and return other page to the buddy 
allocator.

Some argument is in step b, that if the page is owned by a guest, we can 
replace it with a new page through p2m table, and don't need wait till it is 
freed, we didn't do that currently because it is a bit complex to achieve that.

How is your idea on this?

Thanks
Yunhong Jiang

> 
> I attach an updated patch for xen part which also includes some bug fixes.
> 
> Thanks,
> KAZ
> 
> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
> 
> 
> From: "Wang, Shane" <shane.wang@xxxxxxxxx>
> Subject: RE: [Xen-devel] [PATCH 0/2] MCA support with page offlining
> Date: Tue, 16 Dec 2008 19:10:00 +0800
> 
>> For page offlining, It may not be a good way to put the page into an array
>> but a list. 
>> 
>> +    pg->count_info |= PGC_reserved;
>> +    page_offlining[num_page_offlining++] = pg;
>> 
>> I guess the code targets for offlining domain pages only,
> right? How about free pages and xen pages?
>> If so, no need to check in the following when allocating
> free pages, since the offlined pages will not be freed into heap()()().
>> If not, the following may have a bug.
>> 
>> +                if ( !list_empty(&heap(node, zone, j)) ) {
>> +                    pg = list_entry(heap(node, zone,
> j).next, struct page_info, list);
>> +                    if (!(pg->count_info & PGC_reserved))
>> +                        goto found;
>> +                    else
>> +                        printk(XENLOG_DEBUG "Page %p(%lx) is not to be
>> allocated.\n", +                               pg, page_to_maddr(pg)); +  
>> } 
>> 
>> If one free page (not pg) within pg and pg+(1U<<j) is
> offlined, the range pg~pg+(1U<<j) has the risk to be allocated
> with that page.
>> 
>> Shane
>> 
>> Jiang, Yunhong wrote:
>>> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>>>> Hi all,
>>>> 
>>>> I had posted about MCA support for Intel64 before. It had only a
>>>> function to log the MCA error data received from hypervisor.
>>>> 
>>>> http://lists.xensource.com/archives/html/xen-devel/2008-09/msg0 0876.html
>>>> 
>>>> I attach patches that support not only error logging but also Page
>>>> Offlining function. The page where an MCA occurs will offline and not
>>>> reuse. A new flag 'PGC_reserved' was added in page count_info to mark
>>>> the impacted page. 
>>>> 
>>>> I know that it is better to implement the page offlining for general
>>>> purpose, but I implemented for MCA specialized in this first step.
>>> 
>>> Maybe the MCA page offline is a bit different to normal page offline
>>> requirement, so take it as first step maybe a good choice :)
>>> 
>>> As for your current page_offlining, I'm not sure why the PGC_reserved
>>> page should not be freed? Also, for following code, will that make
>>> the heap(node, zone, j) can't be allocated anymore? Maybe we can
>>> creat a special list to hold all those pages and remove them from the
>>> heap list? 
>>> 
>>> +                if ( !list_empty(&heap(node, zone, j)) ) {
>>> +                    pg = list_entry(heap(node, zone, j).next, struct
>>> page_info, list); +                    if (!(pg->count_info &
>>> PGC_reserved)) +                        goto found; +                   
>>> else +                        printk(XENLOG_DEBUG "Page %p(%lx) is not to
>>> be allocated.\n", +                               pg, page_to_maddr(pg));
>>> + 
>>> 
>>> 
>>>> 
>>>> And I also implement the MCA handler of Dom0 which support to
>>>> shutdown the remote domain where a MCA occurred. If the MCA occurred
>>>> on a DomU, Dom0 notifies it to the DomU. When the notify is failed,
>>>> Dom0 calls SCHEDOP_remote_shutdown hypercall.
>>>> 
>>>> [1/2] xen part: mca-support-with-page-offlining-xen.patch
>>> 
>>> We are not sure we really need pass all #MC information to dom0
>>> firstly, and let dom0 to notify domU. Xen should knows about
>>> everything, so it may have knowledge to decide inject virtual #MC to
>>> guest or not. Of course, this does not impact your patch.
>>> 
>>>> [2/2] linux/x86_64 part:
> mca-support-with-page-offlining-linux.patch
>>> 
>>> As for how to inject virtual #MC to guest (including dom0), I think
>>> we need consider following point:
>>> 
>>> a) Benefit from reusing guest #MC handler's . #MC handler is well
>>> known difficult to test, and the native guest handler may have been
>>> tested more widely. Also #MC handler improves as time going-on, reuse
>>> guest's MCA handler share us those improvement.
>>> b) Maintain the PV handler to different OS version may not so easy,
>>> especially as hardware improves, and kernel may have better support
>>> for error handling/containment.
>>> c) #MC handler may need some model specific information to decide the
>>> action, while guest (not dom0) has virtualized CPUID information.
>>> d) Guest's MCA handler may requires the physical information when the
>>> #MC hapen, like the CPU number the #MC happens.
>>> e) For HVM domain, PV handler will be difficult (considering Windows
>>> guest). 
>>> 
>>> And we have several option to support virtual #MC to guest:
>>> 
>>> Option 1 is what currently implemented. A PV #MC handler is
>>> implemented in guest. This PV handler gets MCA information from Xen
>>> HV through hypercall, including MCA MSR value, also some additional
>>> information, like which physical CPU the MCA happened. Option 1 will
>>> help us on issue d), but we need main a PV handler, and can't get
>>> benifit from native handler. Also it does not resolve issue c) quite well.
>>> 
>>> option 2, Xen will provide MCA MSR virtualization so that guest's
>>> native #MC handler can run without changes. It can benifit from guest
>>> #MC handler, but it will be difficult to get model specific
>>> information, and has no physical information.
>>> 
>>> Option 3 uses a PV #MC handler for guest as option 1, but interface
>>> between Xen/guest is abstract event, like offline offending page,
>>> terminate current execution context etc. This should be straight
>>> forward for Linux, but may be difficult to Windows and other OS.
>>> 
>>> Currently we are considering option 2 to provide MCA MSR
>>> virtualization to guest, and dom0 can also benifit from such support
>>> (if guest has different CPUID as native, we will either keep guest
>>> running, or kill guest based on error code). Of course, current
>>> mechanism of passing MCA information from xen to dom0 will still be
>>> useful, but that will be used for logging purpose or for Correcatable
>>> Error. How do you think about this?
>>> 
>>> Thanks
>>> Yunhong Jiang
>>> 
>>>> Signed-off-by: Kazuhiro Suzuki <kaz@xxxxxxxxxxxxxx>
>>>> 
>>>> Thanks,
>>>> KAZ
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.