[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null scheduler bug

To: Dario Faggioli <dfaggioli@xxxxxxxx>, Milan Boberic <milanboberic94@xxxxxxxxx>
From: Julien Grall <julien.grall@xxxxxxx>
Date: Mon, 24 Sep 2018 22:46:18 +0100
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, stefano@xxxxxxxxxxxxxx
Delivery-date: Mon, 24 Sep 2018 21:46:44 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi,

On 09/21/2018 05:20 PM, Dario Faggioli wrote:

[Adding Julien as well.

Julien, this seems related to the RCU issue we fought on ARM when using
Credit2, although this is null, but it's being even more weird...]

On Fri, 2018-09-21 at 16:14 +0200, Milan Boberic wrote:

Hey,
yes, I can see prink's outputs on console and in xl dmesg. Also added
timestamps, here are the results (created and destroyed domU a few
times, just to get more values), this is from xl dmesg:

NULL SCHEDULER - Not stressed PetaLinux host domain.

(XEN) t=218000327743:End of a domain_destroy function
(XEN) t=218000420874:End of a complete_domain_destroy function
(XEN) <G><3>memory_map:add: dom2 gfn=ff0a0 mfn=ff0a0 nr=1
...

Stressed PetaLinux host with command: yes > /dev/null &

(XEN) t=3247747255872:End of a domain_destroy function
(XEN) t=3247747349863:End of a complete_domain_destroy function
...

CREDIT SCHEDULER - not stressed PetaLinux host

(XEN) t=86245669606:End of a domain_destroy function
(XEN) t=86245761127:End of a complete_domain_destroy function
...

Stressed PetaLinux host with yes > /dev/null &

(XEN) t=331229997499:End of a domain_destroy function
(XEN) t=331230091770:End of a complete_domain_destroy function
...

Which, if I'm doing the math properly, tells us that
complete_domain_destroy() is called within ~90us, for both schedulers,
and in all stress/load conditions. That wouldn't be too bad, I think.

And in fact, if I remember correctly, you're saying that adding the
printk()s fixes the issue in null. I wonder why that is... Can you like
kill the printks, store 5 or so of the timestamps (or just the delta)
in a static array or something, and print it from somewhere else (like
a debug-key handler in the same file)?

I think you may end up to receive an interrupt on CPU0 when usingprintk. What is the mapping between vCPU and pCPU on your system?


What I'm after, is how log, after domain_destroy(),
complete_domain_destroy() is called, and whether/how it relates the the
grace period idle timer we've added in the RCU code.

NULL scheduler and vwfi=native will inevitably introduce a latency whendestroying a domain. vwfi=native means the guest will not trap when ithas nothing to do and switch to the idle vCPU. So, in suchconfiguration, it is extremely unlikely the execute the idle_loop oreven enter in the hypervisor unless there are an interrupt on that pCPU.

Per my understanding of call_rcu, the calls will be queued until the RCUreached a threshold. We don't have many place where call_rcu is called,so reaching the threeshold may just never happen. But nothing will tellthat vCPU to go in Xen and say "I am done with RCU". Did I miss anything?

I have the feeling the problem might just be exacerbated the problem(simlar to the idle bug with credit2) by vwfi=ative. Milan, would it bepossible to run the test without that option?


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] null scheduler bug
  - From: Dario Faggioli

References:
- [Xen-devel] null scheduler bug
  - From: Stefano Stabellini
- Re: [Xen-devel] null scheduler bug
  - From: Dario Faggioli
- Re: [Xen-devel] null scheduler bug
  - From: Dario Faggioli
- Re: [Xen-devel] null scheduler bug
  - From: Milan Boberic
- Re: [Xen-devel] null scheduler bug
  - From: Dario Faggioli
- Re: [Xen-devel] null scheduler bug
  - From: Milan Boberic
- Re: [Xen-devel] null scheduler bug
  - From: Milan Boberic
- Re: [Xen-devel] null scheduler bug
  - From: Dario Faggioli
- Re: [Xen-devel] null scheduler bug
  - From: Milan Boberic
- Re: [Xen-devel] null scheduler bug
  - From: Dario Faggioli

Prev by Date: [Xen-devel] [ovmf test] 128015: all pass - PUSHED
Next by Date: [Xen-devel] [PATCH v3 0/2] arm: improve platforms kconfig, was: add one kconfig option per platform
Previous by thread: Re: [Xen-devel] null scheduler bug
Next by thread: Re: [Xen-devel] null scheduler bug
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.