Re: [Xen-devel] [PATCH v15 01/11] multicall: add no preemption ability between two calls

On 10/09/14 11:25, Jan Beulich wrote:
>>>> On 10.09.14 at 12:15, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 10/09/14 11:07, Jan Beulich wrote:
>>>>>> On 10.09.14 at 11:43, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> Actually, on further thought, using multicalls like this cannot possibly
>>>> be correct from a functional point of view.
>>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
>>>> there is no guarantee that accesses to remote cpus msrs won't interleave
>>>> with a different natural access, clobbering the results of the wrmsr.
>>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
>>>> synchronous thread of execution on the appropriate cpu.  You can trust
>>>> that interrupts won't play with these msrs, but you absolutely can't
>>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
>>> Not sure I follow, particularly in the context of the white listing of
>>> MSRs permitted here (which ought to not include anything the
>>> hypervisor needs control over).
>> Consider two dom0 vcpus both using this new multicall mechanism to read
>> QoS information for different domains, which end up both targeting the
>> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
>> may interleave and clobber the first wrmsr.
> But that situation doesn't result from the multicall use here - it would
> equally be the case for an inherently batchable hypercall.

Indeed - I called out multicall because of the current implementation,
but I should have been more clear.

> To deal with
> that we'd need a wrmsr-then-rdmsr operation, or move the entire
> execution of the batch onto the target CPU. Since the former would
> quickly become unwieldy for more complex operations, I think this
> gets us back to aiming at using continue_hypercall_on_cpu() here.

Which gets us back to the problem that you cannot use
copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
in the wrong context.

I think this requires a step back and rethink.  I can't offhand think of
any combination of existing bits of infrastructure which will allow this
to work correctly, which means something new needs designing.


