[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] arm/arm64: xen: Fix to convert percpu address to gfn correctly



On Tue, 6 Oct 2020 10:56:52 -0700 (PDT)
Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:

> On Tue, 6 Oct 2020, Masami Hiramatsu wrote:
> > On Mon, 5 Oct 2020 18:13:22 -0700 (PDT)
> > Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:
> > 
> > > On Mon, 5 Oct 2020, Julien Grall wrote:
> > > > Hi Masami,
> > > > 
> > > > On 05/10/2020 14:39, Masami Hiramatsu wrote:
> > > > > Use per_cpu_ptr_to_phys() instead of virt_to_phys() for per-cpu
> > > > > address conversion.
> > > > > 
> > > > > In xen_starting_cpu(), per-cpu xen_vcpu_info address is converted
> > > > > to gfn by virt_to_gfn() macro. However, since the virt_to_gfn(v)
> > > > > assumes the given virtual address is in contiguous kernel memory
> > > > > area, it can not convert the per-cpu memory if it is allocated on
> > > > > vmalloc area (depends on CONFIG_SMP).
> > > > 
> > > > Are you sure about this? I have a .config with CONFIG_SMP=y where the 
> > > > per-cpu
> > > > region for CPU0 is allocated outside of vmalloc area.
> > > > 
> > > > However, I was able to trigger the bug as soon as CONFIG_NUMA_BALANCING 
> > > > was
> > > > enabled.
> > > 
> > > I cannot reproduce the issue with defconfig, but I can with Masami's
> > > kconfig.
> > > 
> > > If I disable just CONFIG_NUMA_BALANCING from Masami's kconfig, the
> > > problem still appears.
> > > 
> > > If I disable CONFIG_NUMA from Masami's kconfig, it works, which is
> > > strange because CONFIG_NUMA is enabled in defconfig, and defconfig
> > > works.
> > 
> > Hmm, strange, because when I disabled CONFIG_NUMA_BALANCING, the issue
> > disappeared.
> > 
> > --- config-5.9.0-rc4+   2020-10-06 11:36:20.620107129 +0900
> > +++ config-5.9.0-rc4+.buggy     2020-10-05 21:04:40.369936461 +0900
> > @@ -131,7 +131,8 @@
> >  CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
> >  CONFIG_CC_HAS_INT128=y
> >  CONFIG_ARCH_SUPPORTS_INT128=y
> > -# CONFIG_NUMA_BALANCING is not set
> > +CONFIG_NUMA_BALANCING=y
> > +CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
> >  CONFIG_CGROUPS=y
> >  CONFIG_PAGE_COUNTER=y
> >  CONFIG_MEMCG=y
> > 
> > So buggy config just enabled NUMA_BALANCING (and default enabled)
> 
> Yeah but both NUMA and NUMA_BALANCING are enabled in defconfig which
> works fine...
> 
> [...]
> 
> > > The fix is fine for me. I tested it and it works. We need to remove the
> > > "Fixes:" line from the commit message. Ideally, replacing it with a
> > > reference to what is the source of the problem.
> > 
> > OK, as I said, it seems commit 9a9ab3cc00dc ("xen/arm: SMP support") has
> > introduced the per-cpu code. So note it instead of Fixes tag.
> 
> ...and commit 9a9ab3cc00dc was already present in 5.8 which also works
> fine with your kconfig. Something else changed in 5.9 causing this
> breakage as a side effect. Commit 9a9ab3cc00dc is there since 2013, I
> think it is OK -- this patch is fixing something else.

Hmm, then it might be someone runs out the first chunk of percpu and
xen uses the 2nd chunk which is in vmalloc area. It is possible if the
other initcall functions uses alloc_percpu().

Maybe we can trace percpu chunk allocation function for both case.

Thank you,

-- 
Masami Hiramatsu <mhiramat@xxxxxxxxxx>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.