[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Load increase after memory upgrade (part2)



On Tue, Jan 24, 2012 at 08:58:22AM +0000, Jan Beulich wrote:
> >>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> 
> >>> wrote:
> > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> 
> >> > >>> wrote:
> >> > > The issue as I understand is that the DVB drivers allocate their 
> >> > > buffers
> >> > > from 0->4GB most (all the time?) so they never have to do 
> >> > > bounce-buffering.
> >> > > 
> >> > > While the pv-ops one ends up quite frequently doing the 
> >> > > bounce-buffering, 
> >> > > which
> >> > > implies that the DVB drivers end up allocating their buffers above the 
> > 4GB.
> >> > > This means we end up spending some CPU time (in the guest) copying the 
> >> > > memory
> >> > > from >4GB to 0-4GB region (And vice-versa).
> >> > 
> >> > This reminds me of something (not sure what XenoLinux you use for
> >> > comparison) - how are they allocating that memory? Not vmalloc_32()
> >> 
> >> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> >> I am going to look at the 2.6.38 from OpenSuSE.
> >> 
> >> > by chance (I remember having seen numerous uses under - iirc -
> >> > drivers/media/)?
> >> > 
> >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> >> > what their (driver) callers might expect in a PV guest (including the
> >> > contiguity assumption for the latter, recalling that you earlier said
> >> > you were able to see the problem after several guest starts), and I
> >> > had put into our kernels an adjustment to make vmalloc_32() actually
> >> > behave as expected.
> >> 
> >> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> >> pointer.
> > 
> > Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
> > and then performs PCI DMA operations on the allocted vmalloc_32
> > area.
> > 
> > So I cobbled up the attached patch (hadn't actually tested it and sadly
> > won't until next week) which removes the call to vmalloc_32 and instead
> > sets up DMA allocated set of pages.
> 
> What a big patch (which would need re-doing for every vmalloc_32()
> caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
> our 3.2 version of the affected function below, but clearly that's not
> pv-ops ready).

I just want to get to the bottom of this before attempting a proper fix.

> 
> Jan
> 
> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>                                pgprot_t prot, int node, void *caller)
> {
>       const int order = 0;
>       struct page **pages;
>       unsigned int nr_pages, array_size, i;
>       gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> #ifdef CONFIG_XEN
>       gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> 
>       BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
>       if (dma_mask == (__GFP_DMA | __GFP_DMA32))
>               gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> #endif
> 
>       nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
>       array_size = (nr_pages * sizeof(struct page *));
> 
>       area->nr_pages = nr_pages;
>       /* Please note that the recursion is strictly bounded. */
>       if (array_size > PAGE_SIZE) {
>               pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
>                               PAGE_KERNEL, node, caller);
>               area->flags |= VM_VPAGES;
>       } else {
>               pages = kmalloc_node(array_size, nested_gfp, node);
>       }
>       area->pages = pages;
>       area->caller = caller;
>       if (!area->pages) {
>               remove_vm_area(area->addr);
>               kfree(area);
>               return NULL;
>       }
> 
>       for (i = 0; i < area->nr_pages; i++) {
>               struct page *page;
>               gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
> 
>               if (node < 0)
>                       page = alloc_page(tmp_mask);
>               else
>                       page = alloc_pages_node(node, tmp_mask, order);
> 
>               if (unlikely(!page)) {
>                       /* Successfully allocated i pages, free them in 
> __vunmap() */
>                       area->nr_pages = i;
>                       goto fail;
>               }
>               area->pages[i] = page;
> #ifdef CONFIG_XEN
>               if (dma_mask) {
>                       if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
>                               area->nr_pages = i + 1;
>                               goto fail;
>                       }
>                       if (gfp_mask & __GFP_ZERO)
>                               clear_highpage(page);
>               }
> #endif
>       }
> 
>       if (map_vm_area(area, prot, &pages))
>               goto fail;
>       return area->addr;
> 
> fail:
>       warn_alloc_failed(gfp_mask, order,
>                         "vmalloc: allocation failure, allocated %ld of %ld 
> bytes\n",
>                         (area->nr_pages*PAGE_SIZE), area->size);
>       vfree(area->addr);
>       return NULL;
> }
> 
> ...
> 
> #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
> #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
> #elif defined(CONFIG_XEN)
> #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
> #else
> #define GFP_VMALLOC32 GFP_KERNEL
> #endif

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.