Re: Null scheduler and vwfi native problem

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> Hi,

> I see a problem with destroy and restart of a domain. Interrupts are
> not 
> available when trying to restart a domain.
> The situation seems very similar to the thread "null scheduler bug" 
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> .
Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?

> The target system is a iMX8-based ARM board and Xen is a 4.13.0
> version 
> built from https://source.codeaurora.org/external/imx/imx-xen.git.
Mmm, perhaps it's me, but neither going at that url with a browser not
trying to clone it, I do not see anything. What I'm doing wrong?

> Xen is booted with sched=null vwfi=native.
> One physical CPU core is pinned to the domu.
> Some interrupts are passed through to the domu.
Ok, I guess it is involved, since you say "some interrupts are passed

> When destroying the domain with xl destroy etc it does not complain
> but 
> then when trying to restart the domain
> again with a "xl create <domain cfg>" I get:
> (XEN) IRQ 210 is already used by domain 1
> "xl list" does not contain the domain.
> Repeating the "xl create" command 5-10 times eventually starts the 
> domain without complaining about the IRQ.
> Inspired from the discussion in the thread above I have put printks
> in 
> the xen/common/domain.c file.
> In the function domain_destroy I have a printk("End of domain_destroy
> function\n") in the end.
> In the function complete_domain_destroy have a printk("Begin of 
> complete_domain_destroy function\n") in the beginning.
> With these printouts I get at "xl destroy":
> (XEN) End of domain_destroy function
> So it seems like the function complete_domain_destroy is not called.
Ok, thanks for making these tests. It's helpful to have this
information right away.

> "xl create" results in:
> (XEN) IRQ 210 is already used by domain 1
> (XEN) End of domain_destroy function
> Then repeated "xl create" looks the same until after a few tries I
> also get:
> (XEN) Begin of complete_domain_destroy function
> After that the next "xl create" creates the domain.
> I have also applied the patch from 
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html
> . 
> This does seem to change the results.
Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it.

> Starting the system without "sched=null vwfi=native" does not result
> in 
> the problem.
Ok, how about, if you're up for some more testing:

 - booting with "sched=null" but not with "vwfi=native"
 - booting with "sched=null vwfi=native" but not doing the IRQ 
   passthrough that you mentioned above


Dario Faggioli, Ph.D
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part



