[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been looking for)

On 09/19/09 08:34, Dan Magenheimer wrote:
> You're right, I don't need to differentiate between
> the two emulated cases.  I was trying to overload
> an extra piece of information that I really don't
> need to overload.
> However, I do need one special case to indicate
> emulation vs non-emulation, so wraparound is
> still a problem.

I was assuming you'd just repurpose the existing version number scheme
which is always even, and therefore can never equal -1.

>> > If the hardware doesn't support rdtscp, how should an app know whether
>> > or not to use it?  Should it just try running rdtscp being prepared to
>> > handle a SIGILL?
> Yes, that's the plan.  I think this scheme always
> works, but only works fast if the hardware supports
> rdtscp and constant_tsc

What's the full algorithm for detecting this feature?  Usermode has to

   1. It is running under Xen (or not, if you expect this to be
      implemented on multiple hypervisors)
   2. rdtscp is available
   3. the ABI is actually being implemented, ie:
         1. the tsc_aux value actually has the correct meaning
         2. it has a working mechanism for getting the tsc scaling
         3. (accommodate ways to evolve the ABI in a back-compatible way)

before it can do anything else.

If nothing else, its probably worth removing the rdtscp feature from the
logical guest cpuid, so that nothing else tries to use it for its own
purposes; in other words, you're exclusively claiming rdtscp for this
ABI.  Or you could disable this ABI if a guest kernel tries to set TSC_AUX.

> I've restricted the scheme to constant_tsc as I think
> it breaks down due to nasty races if running on a
> machine where the pvclock parameters differ across
> different pcpus.  I think the races can only be
> avoided if Xen sets the TSC_AUX for all of the
> pcpus running a pvrdtscp doman while all are idle.
> Is there a scheme that avoids the races?

rdtscp makes it quite easy to avoid races because you get the tsc and
metadata about the tsc atomically.  You just need to encode enough info
in the metadata to do the conversion.

The obvious thing to do is to pack a version number and pcpu number into
TSC_AUX.  Usermode would maintain an array of pv_clock parameters, one
for each pcpu.  If the version number matches, then it uses the
parameters it has; if not it fetches new parameters and repeats the
rdtscp.  There's no need to worry about either thread or vcpu context
switches because you get the (tsc,params) tuple atomically, which is the
tricky bit without rdtscp.

(The version number would be truncated wrt the normal pvclock version
number, but it just needs to be large enough to avoid aliasing from
wrapping; I'm assuming something like 24 bits version and 8 bits cpu

> Fortunately, this also has the effect of greatly
> reducing the version increase frequency.

I don't think that's going to be a huge issue; fetching time parameters
with a syscall/hypercall would be on the same order as doing an emulated
rdtsc, and would only need to happen, say, once per timeslice (100Hz?)
at the outside.

> The rate is synced but the values may not be.  Since
> software (BIOS or Xen) sets tsc on each processor
> it is essentially impossible to ensure they are
> identical.  The rendezvous algorithm should be able
> to set them so that they are "unobservably" different,
> but I keep hearing "within 2usec".  (It would be
> interesting to measure this across a broad set
> of machines.)  So it's probably prudent to recommend
> that apps be prepared for the possibility even if
> it never happens.

You don't need to guarantee anything stronger than they'd see on bare
hardware.  You also need to be more precise about exactly what you're

Are you saying that a single thread will never see regressing tscs? 
That just requires making sure that Xen gets the tscs synced closer than
the context switch time of a thread between cpus, which should be possible.

Or are you making the stronger guarantee that two threads running
concurrently on different cpus doing rdtsc will see monotonically
increasing tscs with respect to the ordering of all their operations? 
That would require arbitrarily close syncing (well, within a the time it
takes a cacheline to bounce I guess).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.