[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null scheduler bug

On 09/27/2018 06:06 PM, Dario Faggioli wrote:
On Thu, 2018-09-27 at 16:09 +0100, Julien Grall wrote:

Hi Dario,

On 09/27/2018 03:32 PM, Dario Faggioli wrote:
On Thu, 2018-09-27 at 15:15 +0200, Milan Boberic wrote:

In one of your e-mail, you wrote:

"Well, our implementation of RCU requires that, from time to time,
various physical CPUs of your box become idle, or get an interrupt,
go executing inside Xen (for hypercalls, vmexits, etc). In fact, a
going through Xen is what allow us to tell that it reached a so-
'quiescent state', which in turns is necessary for declaring a so-
called 'RCU grace period' over."

I don't quite agree with you on the definition of "quiescent state"

Hehe... I was trying to be both quick and accurate. It's more than
possible that I failed. :-)

To take the domain example, we want to wait until all the CPU has
stopped using the pointer (an hypercall could race put_domain).

I'm not sure what you mean with "an hypercall could race put_domain".

I meant that another CPU get a pointer on the domain until the domain is effectively removed from the list by _domain_destroy.

What we want is to wait until all the CPUs that are involved in the
grace period, have gone through rcupdate.c:cpu_quiet(), or have become

Which is what I meant but in a more convoluted way.

Receiving an interrupt, or experiencing a context switch, or even going
idle, it's "just" how it happens that these CPUs have their chance to
go through cpu_quiet(). It is in this sense that I meant that those
events are used as markers of a quiescent state.

And "wfi=native" (in particular in combination with the null scheduler,
but I guess also with other ones, at least to a certain extent) makes
figuring out the "or have become idle" part tricky. That is the problem
here, isn't it?

That's correct.

pointer will not be in-use if the CPU is in kernel-mode/user-mode or
the idle loop. Am I correct?


So, we want that all the CPUs that were in Xen to have either left Xen
at least once or, if they're still there and have never left, that must
be because they've become idle.

And currently we treat all the CPUs that have not told the RCU
subsystem that they're idle (via rcu_idle_enter()) as busy, without
distinguishing between the ones that are busy in Xen from the one which
are busy in guest (kernel or user) mode.

So I am wondering whether we could:
        - Mark any CPU in kernel-mode/user-mode quiet

Right. We'd need something like a rcu_guest_enter()/rcu_guest_exit()
(or a rcu_xen_exit()/rcu_xen_enter()), which works for all combination
of arches and guest types.

It looks to me too that this would help in this case, as the vCPU that
stays in guest mode because of wfi=idle would be counted as quiet, and
we won't have to wait for it.

        - Raise a RCU_SOFTIRQ in call_rcu?

Mmm... what would be the point of this?

To check if the RCU has work to do. You may call_rcu with all the other CPUs quiet. Or do you expect this to be done in rcu_guest_{enter,exit}()?

With that solution, it may even be possible to avoid the timer in
idle loop.

Not sure. The timer is there to deal with the case when a CPU which has
a callback queued wants to go idle. It may have quiesced already, but
if there are others which have not, either:
1) we let it go idle, but then the callback will run only when it
    wakes up from idle which, without the timer, could be far ahead in
2) we don't let it go idle, but we waste resources;
3) we let it go idle and keep the timer. :-)

Oh right.

But anyway, even if it would not let us get rid of the timer, it seems
like it could be nicer than any other approaches. I accept
help/suggestions about the "let's intercept guest-Xen and Xen-guest
transitions, and track that inside RCU code.

The best place for Arm to have those calls would be in:
- enter_hypervisor_head(): This is called after exiting the guest and before doing any other work - leave_hypervisor_head(): This is called just before returning to the guest.

I am CCing Stefano work e-mail. When I spoke with him he was interested to put some effort towards fixing the bug.


Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.