[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] dom0less + sched=null => broken in staging





On 8/13/19 6:34 PM, Dario Faggioli wrote:
On Tue, 2019-08-13 at 17:52 +0100, Julien Grall wrote:
Hi Dario,

Hello!

On 8/13/19 4:27 PM, Dario Faggioli wrote:
On Fri, 2019-08-09 at 11:30 -0700, Stefano Stabellini wrote:

In my (x86 and "dom0full") testbox, this seems to come from
domain_unpause_by_systemcontroller(dom0) called by
xen/arch/x86/setup.c:init_done(), at the very end of __start_xen().

I don't know if domain construction in an ARM dom0less system works
similarly, though. What we want, is someone calling either
vcpu_wake()
or vcpu_unpause(), after having cleared _VPF_down from pause_flags.

Looking at create_domUs() there is a call to
domain_unpause_by_controller for each domUs.

Yes, I saw that. And I've seen the one done don dom0, at the end of
xen/arch/arm/setup.c:start_xen(), as well.

Also, both construct_dom0() (still from start_xen()) and
construct_domU() (called from create_domUs()) call construct_domain(),
which does clear_bit(_VPF_down), setting the domain to online.

So, unless the flag gets cleared again, or something else happens that
makes the vCPU(s) fail the vcpu_runnable() check in
domain_unpause()->vcpu_wake(), I don't see why the wakeup that let the
null scheduler start scheduling the vCPU doesn't happen... as it
instead does on x86 or !dom0less ARM (because, as far as I've
understood, it's only dom0less that doesn't work, it this correct?)

Yes, I quickly tried to use NULL scheduler with just dom0 and it boots.

Interestingly, I can't see the log:

(XEN) Freed 328kB init memory.

This is called as part of init_done before CPU0 goes into the idle loop.

Adding more debug, it is getting stuck when calling domain_unpause_by_controller for dom0. Specifically vcpu_wake on dom0v0.

The loop to assign a pCPU in null_vcpu_wake() is turning into an infinite loop. Indeed the loop is trying to pick CPU0 for dom0v0 that is already used by dom1v0. So the problem is in pick_cpu() or the data used by it.

It feels to me this is an affinity problem. Note that I didn't request to pin dom0 vCPUs.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.