[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Supporting consistency of vcpu_runstate_info across cpus

On Thu, May 19, 2016 at 12:21:57PM +0200, Dario Faggioli wrote:
> Since, AFAIUI, you're interested in non-Linux guests' perspective, I'm
> adding Roger (and avoiding trimming, for his benefit), who can tell us
> what he thinks of this all, from the FreeBSD point of view.

Thanks, AFAIK vcpu_runstate_info is only used by Linux ATM? (maybe Windows?) 
FreeBSD doesn't do stolen time accounting at all, and (although I would 
really like to see this implemented) I don't foresee myself adding this in 
the near future. That's mainly because FreeBSD doesn't have the necessary 
scheduler hooks, so it's not only implementing the Xen side of it, it needs 
to be plumbed through the scheduler and that doesn't look like an easy task.

NetBSD also doesn't seem to do it, and OpenBSD just gained basic Xen 
support, so no stolen time accounting there also.
> On Thu, May 19, 2016 at 10:49 AM, Juergen Gross <jgross@xxxxxxxx> wrote:
> > On 19/05/16 10:09, Andrew Cooper wrote:
> >> On 19/05/2016 08:53, Juergen Gross wrote:
> >>> A guest kernel can use the vcpu_op hypercall sub-op
> >>> VCPUOP_register_runstate_memory_area to get a copy of the
> >>> vcpu_runstate_info of a vcpu mapped into its memory. As this structure
> >>> has no update indicator it is only save to be read by the vcpu it is
> >>> containing the runstate information of.
> >>>
> >>> Being able to read the runstate info of another cpu is required e.g.
> >>> by the Linux kernel to be able to calculate vruntime: see
> >>>
> >>> http://lists.xen.org/archives/html/xen-devel/2016-05/msg01790.html
> >>>
> >>> I'd suggest to add an "update in progress" indicator in the highest
> >>> bit of vcpu_runstate_info->state_entry_time as this structure element is
> >>> already used to detect vcpu scheduling when vcpu_runstate_info is read
> >>> by the owning vcpu.
> >>>
> >>> The question is how to enable setting this indicator, as the guest must
> >>> be able to cope with it (I believe the Linux kernel would just run fine,
> >>> but we can't be sure this is true for all guests).
> >>>
> >>> I see the following possible solutions:
> >>>
> >>> a) Introduce a new vcpu_op hypercall sub-op for mapping the
> >>>    vcpu_runstate_info with update indicator support (a guest supporting
> >>>    this would try the new sub-op first and could fall back to
> >>>    VCPUOP_register_runstate_memory_area in case of ENOSYS).
> >>>
> >>> b) Add a virtual MSR to switch on the feature (not being able to set the
> >>>    appropriate bit would indicate the feature not being available). This
> >>>    is the variant KVM is using. Does ARM have something like MSRs?

So I assume the vcpu_runstate_info structure is shared between Xen and KVM, 
just like the PV time info structure?

> >>> c) Add another hypercall to switch on the feature (similar to
> >>>    XENVER_get_features we could have a XENVER_set_features).
> >>>
> >>> Any preferences?
> >>
> >> However, irrespective of how you signal the request for new behaviour,
> >> you should see about using a lockless clock rather than a single bit, as
> >> a single bit can't indicate the case where a complete update has
> >> occurred between two samplings.  This will probably require an extension
> >> to the current implementation, at which point you might be able to add a
> >> capability field as well.
> >
> > That's the reason I've chosen state_entry_time as the home for the new
> > bit. state_entry_time is guaranteed to change between two updates. So
> > the logic would look like the following:
> >
> > do {
> >   old_entry_time = READ_ONCE(r->state_entry_time);
> >   rmb();
> >   new_state = READ_ONCE(*r);
> >   rmb();
> > } while (new_state.state_entry_time != old_entry_time ||
> >          (old_entry_time >> 63));
> >
> >> Alternatively, the easiest way will probably be to add a new VMASSIST,
> >> which allows the guest to opt into the new behaviour.
> >
> > Aah, nice. Yes, this seems to be a sensible option.
> >
> FWIW, this looks a good approach to me as well.

I don't have a problem with this, I would just like to use whatever KVM uses 
in order to be able to reduce code duplication if I ever implement this on 


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.