[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Supporting consistency of vcpu_runstate_info across cpus
On 19/05/16 10:09, Andrew Cooper wrote: > On 19/05/2016 08:53, Juergen Gross wrote: >> A guest kernel can use the vcpu_op hypercall sub-op >> VCPUOP_register_runstate_memory_area to get a copy of the >> vcpu_runstate_info of a vcpu mapped into its memory. As this structure >> has no update indicator it is only save to be read by the vcpu it is >> containing the runstate information of. >> >> Being able to read the runstate info of another cpu is required e.g. >> by the Linux kernel to be able to calculate vruntime: see >> >> http://lists.xen.org/archives/html/xen-devel/2016-05/msg01790.html >> >> I'd suggest to add an "update in progress" indicator in the highest >> bit of vcpu_runstate_info->state_entry_time as this structure element is >> already used to detect vcpu scheduling when vcpu_runstate_info is read >> by the owning vcpu. >> >> The question is how to enable setting this indicator, as the guest must >> be able to cope with it (I believe the Linux kernel would just run fine, >> but we can't be sure this is true for all guests). >> >> I see the following possible solutions: >> >> a) Introduce a new vcpu_op hypercall sub-op for mapping the >> vcpu_runstate_info with update indicator support (a guest supporting >> this would try the new sub-op first and could fall back to >> VCPUOP_register_runstate_memory_area in case of ENOSYS). >> >> b) Add a virtual MSR to switch on the feature (not being able to set the >> appropriate bit would indicate the feature not being available). This >> is the variant KVM is using. Does ARM have something like MSRs? >> >> c) Add another hypercall to switch on the feature (similar to >> XENVER_get_features we could have a XENVER_set_features). >> >> Any preferences? > > However, irrespective of how you signal the request for new behaviour, > you should see about using a lockless clock rather than a single bit, as > a single bit can't indicate the case where a complete update has > occurred between two samplings. This will probably require an extension > to the current implementation, at which point you might be able to add a > capability field as well. That's the reason I've chosen state_entry_time as the home for the new bit. state_entry_time is guaranteed to change between two updates. So the logic would look like the following: do { old_entry_time = READ_ONCE(r->state_entry_time); rmb(); new_state = READ_ONCE(*r); rmb(); } while (new_state.state_entry_time != old_entry_time || (old_entry_time >> 63)); > Alternatively, the easiest way will probably be to add a new VMASSIST, > which allows the guest to opt into the new behaviour. Aah, nice. Yes, this seems to be a sensible option. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |