[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression with vcpu runstate info and XEN_RUNSTATE_UPDATE



On 23.09.19 11:51, Jan Beulich wrote:
On 23.09.2019 11:42, Jürgen Groß wrote:
On 16.09.19 17:44, Jan Beulich wrote:
On 16.09.2019 16:50, Andrew Cooper wrote:
After a complicated investigation, it turns out that c/s 2529c850ea48
broke xc_vcpu_getinfo().

The bug looks as if it is in vcpu_runstate_get(), which doesn't account
for XEN_RUNSTATE_UPDATE and calculating a wildly inappropriate delta.
Ultimately, the result of XEN_DOMCTL_getvcpuinfo ends up very
occasionally with op->u.getvcpuinfo.cpu_time being wrong by 1 << 63.

Given some of the callers of vcpu_runstate_get(), I don't think it is
reasonable to pause the VCPU while reading the runstate info.  However,
it is also unclear whether waiting for XEN_RUNSTATE_UPDATE to drop in
vcpu_runstate_get() is safe either.

First and foremost I'm wondering whether simply masking off
XEN_RUNSTATE_UPDATE in vcpu_runstate_get() wouldn't be an
option. The assumption of the feature as a whole is for the
high bit to never be set in an actual time value, after all.

The other option I'd see is for vcpu_runstate_get() to gain
a boolean return type by which it would indicate to
interested callers whether the latching of the values
happened while an update was in progress elsewhere. Callers
needing to consume the potentially incorrect result could
then choose to wait or schedule a hypercall continuation.

The 3rd option (less desirable imo not the least because it
would require touching all callers) would be for the function
to gain a parameter telling it whether to spin until
XEN_RUNSTATE_UPDATE is observed clear.

And the 4th option would be to let vcpu_runstate_get() operate on a
local runstate copy in order to avoid setting XEN_RUNSTATE_UPDATE in
the "official" runstate info of the vcpu.

But it already does - first thing it does is a memcpy() from the
"official" instance to a caller provided buffer. (What is
confusing, at least to me, is that the lock gets dropped last,
when everything after the memcpy() already acts on the copy only.)

Oh, that was my fault here: I meant to let update_runstate_area()
operate on a local copy, of course.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.