[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Supporting consistency of vcpu_runstate_info across cpus

On 19/05/16 10:09, Andrew Cooper wrote:
> On 19/05/2016 08:53, Juergen Gross wrote:
>> A guest kernel can use the vcpu_op hypercall sub-op
>> VCPUOP_register_runstate_memory_area to get a copy of the
>> vcpu_runstate_info of a vcpu mapped into its memory. As this structure
>> has no update indicator it is only save to be read by the vcpu it is
>> containing the runstate information of.
>> Being able to read the runstate info of another cpu is required e.g.
>> by the Linux kernel to be able to calculate vruntime: see
>> http://lists.xen.org/archives/html/xen-devel/2016-05/msg01790.html
>> I'd suggest to add an "update in progress" indicator in the highest
>> bit of vcpu_runstate_info->state_entry_time as this structure element is
>> already used to detect vcpu scheduling when vcpu_runstate_info is read
>> by the owning vcpu.
>> The question is how to enable setting this indicator, as the guest must
>> be able to cope with it (I believe the Linux kernel would just run fine,
>> but we can't be sure this is true for all guests).
>> I see the following possible solutions:
>> a) Introduce a new vcpu_op hypercall sub-op for mapping the
>>    vcpu_runstate_info with update indicator support (a guest supporting
>>    this would try the new sub-op first and could fall back to
>>    VCPUOP_register_runstate_memory_area in case of ENOSYS).
>> b) Add a virtual MSR to switch on the feature (not being able to set the
>>    appropriate bit would indicate the feature not being available). This
>>    is the variant KVM is using. Does ARM have something like MSRs?
>> c) Add another hypercall to switch on the feature (similar to
>>    XENVER_get_features we could have a XENVER_set_features).
>> Any preferences?
> However, irrespective of how you signal the request for new behaviour,
> you should see about using a lockless clock rather than a single bit, as
> a single bit can't indicate the case where a complete update has
> occurred between two samplings.  This will probably require an extension
> to the current implementation, at which point you might be able to add a
> capability field as well.

That's the reason I've chosen state_entry_time as the home for the new
bit. state_entry_time is guaranteed to change between two updates. So
the logic would look like the following:

do {
  old_entry_time = READ_ONCE(r->state_entry_time);
  new_state = READ_ONCE(*r);
} while (new_state.state_entry_time != old_entry_time ||
         (old_entry_time >> 63));

> Alternatively, the easiest way will probably be to add a new VMASSIST,
> which allows the guest to opt into the new behaviour.

Aah, nice. Yes, this seems to be a sensible option.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.