[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 5/7] vpci: fix execution of long running operations



Hi Roger,

On 11/8/18 12:20 PM, Roger Pau Monné wrote:
On Thu, Nov 08, 2018 at 11:52:57AM +0000, Julien Grall wrote:
Hi Roger,

On 11/8/18 11:44 AM, Roger Pau Monné wrote:
On Thu, Nov 08, 2018 at 11:42:35AM +0000, Julien Grall wrote:
Hi,

Sorry to jump in the conversation late.

On 11/8/18 11:29 AM, Roger Pau Monné wrote:
Why would that be? The do_softirq() invocation sits on the exit-
to-guest path, explicitly avoiding any such nesting unless there
was a do_softirq() invocation somewhere in a softirq handler.

It sits on an exit-to-guest path, but the following chunk:

raise_softirq(SCHEDULE_SOFTIRQ);
do_softirq();

Would prevent the path from ever reaching the exit-to-guest and
nesting on itself, unless the vCPU is marked as blocked, which
prevents it from being scheduled thus avoiding this recursion.

I can't see how the recursion could happen on Arm. So is it an x86 issue?

This is not an issue with the current code, I was just discussing with
Jan how to properly implement vPCI long running operations that need
to be preempted.

To give more context on my question, we are looking at handling preemption
on Arm in some long running operations (e.g cache flush) without having to
worry about returning to guest.

I am thinking something along the following on Arm in a loop.

for ( .... )
{
    do_action
    if ( try_reschedule )
    {
        raise_softirq(SCHEDULE_SOFTIRQ);
        do_softirq();
    }
}

This would require to have no lock taken but I think it would work on Arm
for any long operations. So I am quite interested on the result on the
discussions here.

As said to Jan, I don't think this is viable because you could end up
recursing in do_softirq if there are no other guests to run and enough
reschedules.

Let's image that there's only 1 vCPU to run, and that it has a long
running operation pending. I assume you will somehow hook the code to
perform such operation in the guest resume path:

do_softirq()
     do_action()
-> preempt
         raise_softirq(SCHEDULE);
         do_softirq();
             do_action();
-> preempt
                 raise_softirq(SCHEDULE);
                 do_softirq();
                     do_action();
-> preempt
...

As you can see this could overflow the stack if the are enough
preemptions.

This sounds like an x86 specific issue. In the case of Arm, the context_switch() function will return, so we will come back in the loop before.

We can do this because the hypervisor stack is per-VCPU. So there are no stack overflowed involved here.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.