[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] rdtscP and xen (and maybe the app-tsc answer I've been looking for)
> > However, I do need one special case to indicate > > emulation vs non-emulation, so wraparound is > > still a problem. > > I was assuming you'd just repurpose the existing version number scheme > which is always even, and therefore can never equal -1. That wasn't my plan but if it can be made to work (see below), it probably saves code in Xen. > What's the full algorithm for detecting this feature? Usermode has to > establish: > > 1. It is running under Xen (or not, if you expect this to be > implemented on multiple hypervisors) > 2. rdtscp is available > 3. the ABI is actually being implemented, ie: > 1. the tsc_aux value actually has the correct meaning > 2. it has a working mechanism for getting the tsc scaling > parameters > 3. (accommodate ways to evolve the ABI in a > back-compatible way) > before it can do anything else. Yes, that's what I was thinking. I was planning on prototyping these checks with "userland-rdmsr" but userland-hypercall or userland-shared-page could work also. > If nothing else, its probably worth removing the rdtscp > feature from the > logical guest cpuid, so that nothing else tries to use it for its own > purposes; in other words, you're exclusively claiming rdtscp for this > ABI. Or you could disable this ABI if a guest kernel tries > to set TSC_AUX. I was thinking that setting pvrdtscp=1 would override any kernel use of rdtscp/TSC_AUX, but disabling the cpuid has_rdtscp flag and using a different userland detection mechanism (than checking cpuid for has_rdtscp) would be a better way to avoid possible conflict. > > I've restricted the scheme to constant_tsc as I think > > it breaks down due to nasty races if running on a > > machine where the pvclock parameters differ across > > different pcpus. I think the races can only be > > avoided if Xen sets the TSC_AUX for all of the > > pcpus running a pvrdtscp doman while all are idle. > > > > Is there a scheme that avoids the races? > > rdtscp makes it quite easy to avoid races because you get the tsc and > metadata about the tsc atomically. You just need to encode > enough info > in the metadata to do the conversion. Yes but I don't think there is enough bits for encoding it all (32-bits in TSC_AUX, right?). > The obvious thing to do is to pack a version number and pcpu > number into > TSC_AUX. Usermode would maintain an array of pv_clock parameters, one > for each pcpu. If the version number matches, then it uses the > parameters it has; if not it fetches new parameters and repeats the > rdtscp. There's no need to worry about either thread or vcpu context > switches because you get the (tsc,params) tuple atomically, > which is the > tricky bit without rdtscp. > > (The version number would be truncated wrt the normal pvclock version > number, but it just needs to be large enough to avoid aliasing from > wrapping; I'm assuming something like 24 bits version and 8 bits cpu > number.) I think a race occurs if the vcpu switches pcpu TWICE from pcpu-A to pcpu-B and back to pcpu-A and does rdtscp each time on pcpu-A but reads one or more pvclock parameters (that are too big to be encoded in TSC_AUX) on pcpu-B. If Xen can atomically bump/change TSC_AUX on *all* pcpus runniing a guest vcpu, the race can be avoided. But I suspect that is too expensive (some kind of rendezvous required for each bump on any processor). > > Fortunately, this also has the effect of greatly > > reducing the version increase frequency. > > I don't think that's going to be a huge issue; fetching time > parameters > with a syscall/hypercall would be on the same order as doing > an emulated > rdtsc, and would only need to happen, say, once per timeslice (100Hz?) > at the outside. Even if my assumption of the race (above) is incorrect, 32-bits is not very much time at 100Hz. But the version bump needs to occur synchronously with every P/C-state transition for pvclock to work on non_constant_tsc machines doesn't it? How frequent can those transitions occur? > > The rate is synced but the values may not be. Since > > software (BIOS or Xen) sets tsc on each processor > > it is essentially impossible to ensure they are > > identical. The rendezvous algorithm should be able > > to set them so that they are "unobservably" different, > > but I keep hearing "within 2usec". (It would be > > interesting to measure this across a broad set > > of machines.) So it's probably prudent to recommend > > that apps be prepared for the possibility even if > > it never happens. > > You don't need to guarantee anything stronger than they'd see on bare > hardware. You also need to be more precise about exactly what you're > guaranteeing. > > Are you saying that a single thread will never see regressing tscs? > That just requires making sure that Xen gets the tscs synced > closer than > the context switch time of a thread between cpus, which > should be possible. > > Or are you making the stronger guarantee that two threads running > concurrently on different cpus doing rdtsc will see monotonically > increasing tscs with respect to the ordering of all their operations? > That would require arbitrarily close syncing (well, within a > the time it > takes a cacheline to bounce I guess). I guess this all depends on what Xen is capable of guaranteeing. If Xen can provide a "cacheline bounce guarantee", the app shouldn't have to care. Linux now seems to provide a cacheline bounce guarantee for itself, but afaik has no way to communicate that to an app using raw rdtsc{,p} and all the relevant syscalls have a monotonicity option and/or have insufficient resolution to matter. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |