[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



>>> On 09.11.17 at 15:16, <raistlin@xxxxxxxx> wrote:
> Ah, yes, my bad! What if I take vcpu_migrate() out of the above exec-
> trace (which is what I wanted to do in my email already)?
> 
> pCPU1
> =====
> current == vCPU1
> context_switch(next == idle)
> !! __context_switch() is skipped
> anything_that_uses_or_touches_context()
> 
> Point being, is the underlying and general "issue" here, really bound
> to migrations, or is it something intrinsic of lazy context switch? I'm
> saying it's the latter.

The general issue doesn't require vcpu_migrate(), I agree. The
specific VMX issue here does, though.

Thing is - I'm not convinced there's a general issue here in the first
place: Asynchronous code isn't supposed to modify state behind
the back of synchronous code. It just so happens that VMX code is
structured to violate that assumption when PML is in use.

> That being said, sure it makes sense to assume that, if we migrated the
> vCPU from pCPU1 to pCPU2, it's highly unlikely that it will resume the
> execution on pCPU1, and hence there is no point in leaving its context
> there, and we could very well sync. It's a reasonable best-effort
> measure, but can we rely on it for correctness? I don't think we can.

We can't right now, but code (from an abstract pov at least)
could be structured so we could rely on it.

> And generalizing the idea enough that we could then rely on it for
> correctness, tends to be close enough to not doing lazy context switch
> at all, I think.

I don't think so, no - we could still leave state in hardware in
anticipation that no other non-idle vCPU would be scheduled
on the local CPU. That's something that ought to help in
particular pinned vCPU-s.

>> The problem is with tasklet / softirq
>> (and hence also RCU) work. 
>>
> Yes.
> 
>> Tasklets already take care of this by
>> calling sync_local_execstate() before calling the handler. But
>> for softirqs this isn't really an option; I'm surprised to see that
>> tasklet code does this independently of what kind of tasklet that
>> is. 
>>
> Good point. Weird indeed.

I've added an item to my todo list to see whether I can figure
whether this is an okay thing to do.

>> Softirq tasklets aren't used very often, so I wonder if we have
>> a latent bug here. Otoh, if this was actually fine, adding a similar
>> call to rcu_do_batch() (or its caller) would be an option, I think.
>> 
> We can have a look.
> 
> What about the effect on performance, though?
> 
> I mean, assuming that lazy context switch does a good job, with respect
> to that, by avoiding synching in enough case where it is actually not
> necessary, how do things change if we start to sync at any softirq,
> even when the handler would have not required that?

I wasn't suggesting this for softirqs, but only (if at all) for RCU. But
yes, there would a performance hit from this; not sure how small
or large. But as you can see, for tasklets the hit is taken
unconditionally.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.