[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [REPOST] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version



Hello,

Just wondering.. what's the status on this feature? 
Is it ready to be merged? 

-- Pasi

On Wed, Aug 25, 2010 at 04:00:27PM +0200, Daniel Kiper wrote:
> Hi,
> 
> On Thu, Aug 12, 2010 at 05:43:33PM -0700, Jeremy Fitzhardinge wrote:
> >  On 08/11/2010 06:22 PM, Daniel Kiper wrote:
> > >>Overall, this looks much better.  The next step is to split this into at
> > >>least two patches: one for the core code, and one for the Xen bits.
> > >>Each patch should do just one logical operation, so if you have several
> > >>distinct changes to the core code, put them in separate patches.
> > >I will do that if this patch will be accepted.
> >
> > First step is to post it to lkml for discussion, cc:ing the relevant
> > maintainers. (I'm not really sure who that is at the moment.  It will
> > take some digging around in the history.)
> 
> I took all relevant addresses (sorry if I missed somebody) from MAINTAINERS
> file and they are in To in most of e-mails from me.
> 
> > >>Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime
> > >>when in a Xen context?
> > >There is no simple way to do that. It requiers to do some
> > >changes in drivers/base/memory.c code. I think it should
> > >be done as kernel boot option (on by default to not break
> > >things using this interface now). If it be useful for maintainers
> > >of mm/memory_hotplug.c and drivers/base/memory.c code then
> > >I could do that. Currently original arch/x86/Kconfig version
> > >is restored.
> >
> > I think adding a global flag which the Xen balloon driver can disable
> > should be sufficient.  There's no need to make an separate user-settable
> > control.
> 
> OK.
> 
> > >>>+/* we are OK calling __meminit stuff here - we have
> > >>>CONFIG_MEMORY_HOTPLUG
> > >>>*/
> > >>>+static int __ref xen_add_memory(int nid, u64 start, u64 size)
> > >>Could this be __meminit too then?
> > >Good question. I looked throught the code and could
> > >not find any simple explanation why mm/memory_hotplug.c
> > >authors used __ref instead __meminit. Could you (mm/memory_hotplug.c
> > >authors/maintainers) tell us why ???
> >
> > Quite possibly a left-over from something else.  You could just try
> > making it __meminit, then compile with, erm, the option which shows you
> > section conflicts (it shows the number of conflicts at the end of the
> > kernel build by default, and tells you how to explicitly list them).
> 
> Small reminder: make CONFIG_DEBUG_SECTION_MISMATCH=y
> 
> I reviewed kernel source code once again. It is OK. Normaly it is
> not allowed to reference code/data tagged as .init.* because
> that sections are freed at the end of kernel boot sequence and
> they do not exists any more in memory. However it is sometimes
> required to use code/data marked .init.*. To allow that __ref
> tag is used and then referenced objects are not removed from
> memory (and no warnings are displayed during kernel compilation).
> 
> > >>What's this for?  I see all its other users are in the memory hotplug
> > >>code, but presumably they're concerned about a real S3 suspend.  Do we
> > >>care about that here?
> > >Yes, because as I know S3 state is supported by Xen guests.
> >
> > Yes, but I'm assuming the interaction between S3 and ACPI hotplug memory
> > isn't something that concerns a Xen guest; our hotplug mechanism is
> > completely different.
> 
> Suspend/Hibernation code in Linux Kernel is platform independent
> to some extent and it does not require ACPI. It means that
> lock_system_sleep/unlock_system_sleep is required in that
> place to have memory state intact during suspend/hibernation.
> 
> > >>>+                r->name = "System RAM";
> > >>How about making it clear its Xen hotplug RAM?  Or do things care about
> > >>the "System RAM" name?
> > >As I know no however as I saw anybody do not differentiate between
> > >normal and hotplugged memory. I thought about that ealier however
> > >stated that this soultion does not give us any real gain. That is why
> > >I decided to use standard name for hotplugged memory.
> >
> > Its cosmetic, but it would be useful to see what's going on.
> 
> If you wish I will do that, however then it should be changed
> as well add_registered_memory() function syntax. It should
> contain pointer to name published through /sys/firmware/memmap
> interface. I am not sure it is good solution to change
> add_registered_memory() function syntax which I think should be
> same as add_memory() function syntax.
> 
> > >+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long
> > >boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size;
> >
> > So does this mean you only support adding a single hotplug region?  What
> > happens if your initial increase wasn't enough and you want to add
> > more?  Would you make this a list of hot-added memory or something?
> >
> > But I'm not even quite sure why you need to keep this as global data.
> 
> No. It supports multiple allocations. This variables are
> used mostly for communication between allocate_additional_memory
> and hotplug_allocated_memory functions.
> 
> > >PFN_PHYS(section_nr_to_pfn(pfn_to_section_nr(balloon_stats.boot_max_pfn)
> > >+ 1)); + r_size = (balloon_stats.target_pages -
> > >balloon_stats.current_pages) << PAGE_SHIFT;
> >
> > So this just reserves enough resource to satisfy the current outstanding
> > requirement?  That's OK if we can repeat it, but it looks like it will
> > only do this once?
> 
> For full description of current algorithm
> please look at the end of this e-mail.
> 
> > >+ + rc = allocate_resource(&iomem_resource, r, r_size, r_min, +
> > >ULONG_MAX, PAGE_SIZE, NULL, NULL);
> >
> > Does this need to be section aligned, with a section size?  Or is any old
> > place OK?
> 
> It is always PAGE_SIZE aligned and not below than
> <max_address_of_section_allocated_at_boot> + 1.
> 
> > >reservation.nr_extents = nr_pages; + + rc =
> > >HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> >
> > Allocating all the memory here seems sub-optimal.
> 
> Here is allocated min(nr_pages, ARRAY_SIZE(frame_list)) of pages.
> 
> > >0 ? rc : rc != nr_pages; +} + +static void hotplug_allocated_memory(void)
> >
> > Why is this done separately from the reservation/allocation from xen?
> 
> First memory is allocated in batches of min(nr_pages, ARRAY_SIZE(frame_list))
> of pages and then whole allocated memory is hotplugged.
> 
> > >{ - unsigned long target = balloon_stats.target_pages; + int nid, ret;
> > >+ struct memory_block *mem; + unsigned long pfn, pfn_limit; + + nid =
> > >memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr);
> >
> > Is the entire reserved memory range guaranteed to be within one node?
> >
> > I see that this function has multiple definitions depending on a number
> > of config settings.  Do we care about what definition it has?
> 
> As I know (maybe I have missed something) currently Xen does not support
> NUMA in guests and nid is always 0. However maybe it will be good to create
> Xen specific version of memory_add_physaddr_to_nid function.
> 
> > >= online_pages(PFN_DOWN(balloon_stats.hotplug_start_paddr), +
> > >balloon_stats.hotplug_size >> PAGE_SHIFT);
> >
> > Are the pages available for allocation by the rest of the kernel from
> > this point on?
> 
> Yes.
> 
> > Which function allocates the actual page structures?
> 
> add_registered_memory()
> 
> >
> > >+ + if (ret) { + pr_err("%s: online_pages: Failed: %i\n", __func__,
> > >ret); + goto error; + } + + pfn =
> > >PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn +
> > >(balloon_stats.hotplug_size >> PAGE_SHIFT); - return target; + for (;
> > >pfn < pfn_limit; pfn += PAGES_PER_SECTION) { + mem =
> > >find_memory_block(__pfn_to_section(pfn)); + BUG_ON(!mem); +
> > >BUG_ON(!present_section_nr(mem->phys_index)); +
> > >mutex_lock(&mem->state_mutex); + mem->state = MEM_ONLINE; +
> > >mutex_unlock(&mem->state_mutex); + }
> >
> > What does this do?  How is it different from what online_pages() does?
> 
> This updates /sys/devices/system/memory/memory*/state files
> which contain information about states of sections.
> 
> > >+ else if (is_memory_resource_reserved()) + hotplug_allocated_memory();
> >
> > Why can't this be done in allocate_additional_memory()?
> 
> Because memory is allocated in relatively small
> batches and then whole memory is hotplugged.
> 
> > In your earlier, patch I think you made the firmware_map_add_hotplug()
> > be specific to add_memory, but now you have it in __add_memory.  Does it
> > make a difference either way?
> 
> It was not available in Linux Kernel Ver. 2.6.32.*
> on which based first versions of this patch.
> It updates /sys/firmware/memmap.
> 
> > As before, this all looks reasonably good.  I think the next steps
> > should be:
> >
> >   1. identify how to incrementally allocate the memory from Xen, rather
> >      than doing it at hotplug time
> >   2. identify how to disable the sysfs online interface for Xen
> >      hotplugged memory
> >
> > For 1., I think the code should be something like:
> >
> > increase_address_space(unsigned long pages)
> > {
> >     - reserve resource for memory section
> >     - online section
> >     for each page in section {
> >             online page
> >             mark page structure allocated
> >             add page to ballooned_pages list
> >             balloon_stats.balloon_(low|high)++;
> >     }
> > }
> 
> Here is current algorithm:
>   - allocate_resource() with size requested by user,
>   - allocate memory in relatively small batches,
>   - add_registered_memory(),
>   - online_pages(),
>   - update /sys/devices/system/memory/memory*/state files.
> 
> > The tricky part is making sure that the memory for the page structures
> > has been populated so it can be used.  Aside from that, there should be
> > no need to have another call to
> > HYPERVISOR_memory_op(XENMEM_populate_physmap, ...) aside from the
> > existing one.
> 
> Currently it is.
> 
> > 2 requires a deeper understanding of the existing hotplug code.  It
> > needs to be refactored so that you can use the core hotplug machinery
> > without enabling the sysfs page-onlining mechanism, while still leaving
> > it available for physical hotplug.  In the short term, having a boolean
> > to disable the onlining mechanism is probably the pragmatic solution, so
> > the balloon code can simply disable it.
> 
> I think that sysfs should stay intact because it contains some
> useful information for admins. We should reconsider avaibilty
> of /sys/devices/system/memory/probe. In physical systems it
> is available however usage without real hotplug support
> lead to big crash. I am not sure we should disable probe in Xen.
> Maybe it is better to stay in sync with standard behavior.
> Second solution is to prepare an interface (kernel option
> or only some enable/disable functions) which give possibilty
> to enable/disable probe interface when it is required.
> 
> Daniel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.