[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [RFC][PATCH] Basic support for page offline



At 03:54 -0500 on 09 Feb (1234151686), Jiang, Yunhong wrote:
> Hi, Tim, this patchset try to support page offline request. I want to get 
> some initial feedback before more testing.

I haven't had a chance to read the patches in detail yet, but my initial
impression is that:

 - The general approach so far seems good (I suspect that your 2.3 stage
   below could also be done like 2.2 without a full live migration but 
   since that's not implemented yet that's fine).
 - It seems like a lot of code for what it does.  On the Xen side that's
   just a general impression since I'm not familiar with the bits of the 
   heap allocators that you're changing.  In libxc you seem to have 
   duplicated parts of the save/restore code -- better to make those 
   routines externally visible to the rest of libxc and call them 
   from your new function.
 - Like all systems code everywhere, it needs more comments. :)  You've
   introduced some generic-sounding functions (adjust_pte &c) without
   describing what they do.

I'll have more detailed comments later in the week, I hope. 

Cheers,

Tim.

> Page offline can be used by multiple usage model, belows are some examples:
> a) If too many correctable error happen to one page, management tools may try 
> to offline the page to avoid more server error in future;
> b) When page is ECC error and can't be recoverd by hardware, Xen's MCA 
> handler may try to offline the page, so that it will not be accessed anymore.
> c) Offline some DIMM for power management etc (Of course, this is far more 
> than simple page offline)
> 
> The basic idea to offline a page is:
> 1) If a page is free, it will be removed from page allocator
> 2) If page is in use, the owner will be checked
>   2.1) if it is owned by xen/dom0, the offline will be failed
>   2.2) If it is owned by a PV guest with no device assigned, user space tools 
> will try to replace the page with new one.
>   2.3) It it is owned by a HVM guest with no device assigned, user space 
> tools will try to live migration it.
>   2.4) If it is owned by a guest with device assigned, user space tools can 
> do live migration if needed.
> 
> This patchset includes support for type 2.1/2.2. 
> 
> page_offfline_xen.patch gives basic support. The new hypercall 
> (XEN_SYSCTL_page_offline) will mark a page offlining if the page is in-use, 
> otherwise, it will remove the page from the page allocator. It also changes 
> the free_heap_pages(), so that if a page_offlining page is freed, that page 
> will be marked as page_offlined and will not be allocated anymore. One tricky 
> thing is, the offlined page may not be buddy-aligned (i.e., it may be in the 
> middle of a 2^order pages), so that we have to re-arrange the buddy system 
> (i.e. &heap[][][]) carefully.
> 
> page_offline_xen_memory.patch add support to PV guest, a new hypercall 
> (XENMEM_page_offline) try to replace the old page with the new one. This will 
> happen only when the guest has been suspeneded, to avoid complex page sharing 
> situation. I'm still checking if more situation need be considered, like LDT 
> pages and CR3 pages, so any suggestion is really great help.
> 
> page_offline_tools.patch is an example user space tools based on 
> libxc/xc_domain_save.c, it will try to firstly mark a page offline, and 
> checking the result. If a page is owned by a PV guest, it will try to replace 
> the pages.
> 
> I did some basic testing, tried free pages and PV guest pages and is ok. Of 
> course, I need more test on it. And more robust error handling is needed.
> 
> Any suggestion is welcome.
> 
> Thanks
> Yunhong Jiang





-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.