[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable-smoke test] 112957: regressions - trouble: broken/fail/pass



On Mon, 2017-09-04 at 09:46 +0100, George Dunlap wrote:
> On 09/02/2017 04:39 PM, Julien Grall wrote:
> > 
> > If I am not mistaken the hypervisor stack is per-vCPU. So when you
> > move the
> > vCPU to another pCPU, the stack will be moved.
> > This means the smp_processor_id() will return a different value.
> > Isn't it
> > the same on x86?
> 
> No, the hypervisor stack on x86 has always been per-pcpu.  Apparently
> the powerpc port was per-vcpu, which is why the smp_processor_id()
> was
> there.  I (and apparently Dario) assumed the ARM implementation was
> the
> same as x86, which is why I checked in this change.
> 
So, AFAIUI, the reason why the re-sampling at all iterations was
introduced (in ae9bfcdc, "[XEN] Various softirq cleanups") was that, on
IA64 (not powerpc :-D), actual context_switch() returns.

Basically, we are in do_softirq(), with SCHEDULE_SOFTIRQ set, so we
call the handler, which is schedule() (__enter_scheduler(), back at the
time), which calls context_switch(), which switch the stack.

On x86, context_switch() does not 'return', it jumps (via
schedule_tail()) to trying to resume guest context (of the to be
scheduled vCPU, which may be a different one). During that path, we do
check softirqs again, and we may go back to do_softirq(), but if we do,
we execute the function from its entry point, and hence we re-
initialize cpu, outside of the loop.

OTOH, on IA64, context_switch(), and hence schedule()
(__enter_scheduler()), does a regular 'return'. So, we go back to the
for(;;) loop in do_softirq(), with (I think, but I don't speak any IA64
:-/) the stack changed. And that's why we need to refresh the content
of the 'cpu' local variable.

So, I now think that what I did not understand, when looking at ARM
code, was that context_switch() does indeed return, and hence we do at
least another step inside the loop, and hit the ASSERT(), which I guess
may trigger if what's in spite of the local variable 'cpu', in the new
stack, is different than smp_processor_id().

Re-checking things now, I actually do see that context_switch() on ARM
is not 'terminal'. It call schedule_tail(), which on x86 does not
return, while in ARM, it does. I must have confused these two... Sorry.

Is this analysis correct?

Also, mostly out of curiosity, still looking at ARM code, I'm not
getting at all how continue_new_vcpu() works (e.g., when/how is it
invoked?).

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.