[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null scheduler bug

Reply for Julien,
yes, my platform have 4 CPUs it's UltraZed-EG board with carrier card.
I use only 2 CPUs, one for dom0 which is PetaLinux and one for domU
which is bare-metal application that blinks LED on the board (I use it
to measure jitter with oscilloscope), other two CPUs are unused (in
idle loop).

About command, commad is from xen-overlay.dtsi file which is included
in system-user.dtsi file in my project. Whole file is included in
atatchment in my earlier reply.

About this options:
I was just testing them to see will I get any performance improvement,
I will remove them right away.

Best regards, Milan Boberic!
On Tue, Sep 25, 2018 at 1:15 PM Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi Dario,
> On 09/25/2018 10:02 AM, Dario Faggioli wrote:
> > On Mon, 2018-09-24 at 22:46 +0100, Julien Grall wrote:
> >> On 09/21/2018 05:20 PM, Dario Faggioli wrote:
> >>>
> >>> What I'm after, is how log, after domain_destroy(),
> >>> complete_domain_destroy() is called, and whether/how it relates the
> >>> the
> >>> grace period idle timer we've added in the RCU code.
> >>
> >> NULL scheduler and vwfi=native will inevitably introduce a latency
> >> when
> >> destroying a domain. vwfi=native means the guest will not trap when
> >> it
> >> has nothing to do and switch to the idle vCPU. So, in such
> >> configuration, it is extremely unlikely the execute the idle_loop or
> >> even enter in the hypervisor unless there are an interrupt on that
> >> pCPU.
> >>
> > Ah! I'm not familiar with wfi=native --and in fact I was completely
> > ignoring it-- but this analysis makes sense to me.
> >
> >> Per my understanding of call_rcu, the calls will be queued until the
> >> RCU
> >> reached a threshold. We don't have many place where call_rcu is
> >> called,
> >> so reaching the threeshold may just never happen. But nothing will
> >> tell
> >> that vCPU to go in Xen and say "I am done with RCU". Did I miss
> >> anything?
> >>
> > Yeah, and in fact we added the timer _but_, in this case, it does not
> > look that the timer is firing. It looks much more like "some random
> > interrupt happens", as you're suggesting. OTOH, in the case where there
> > are no printk()s, it might be that the timer does fire, but the vcpu
> > has not gone through Xen, so the grace period is, as far as we know,
> > not expired yet (which is also in accordance with Julien's analysis, as
> > far as I understood it).
> The timer is only activated when sched_tick_suspend() is called. With
> vwfi=native, you will never reach the idle_loop() and therefore never
> setup a timer.
> Milan confirmed that guest can be destroyed with vwfi=native removed. So
> this is confirming my thinking. Trapping wfi will end up to switch to
> idle vCPU and trigger the grace period.
> I am not entirely sure you will be able to reproduce it on x86, but I
> don't think it is a Xen Arm specific.
> When I looked at the code, I don't see any grace period in other context
> than idle_loop. Rather than adding another grace period, I would just
> force quiescence for every call_rcu.
> This should not be have a big performance impact as we don't use much
> call_rcu and it would allow domain to be fully destroyed in timely manner.
> Cheers,
> --
> Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.