[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] write_tsc in a PV domain?
On 08/28/09 10:49, Dan Magenheimer wrote: >> Apps are free to try and use the tsc in any way they >> feel like, but it has never had any >> GUARANTEED [djm's emphasis] properties. >> > I think this is the key difference of opinion which > must be resolved. If what you say is true, your > other positions make sense. If it is false, > they make much less sense. (And unfortunately > it is not a black and white issue.) > > There ARE guaranteed properties specified by > the Intel SDM for any _single_ processor, > namely that rdtsc is "guaranteed to return > a monotonically increasing unique value whenever > executed, except for 64-bit counter wraparound. > Intel guarantees that the time-stamp counter > will not wrap-around within 10 years after being > reset." Both uses of the word "guarantee" > are quoted from the Intel SDM. > Yes, but those are fairly weak guarantees. It does not guarantee that the tsc won't change rate arbitrarily, or stop outright between reads. > What is NOT guaranteed, but is widely and > incorrectly assumed to be implied and has > gotten us into this mess, is that > the same properties applies across multiple > processors. Yes, Linux offers even weaker guarantees than Intel. Aside from the processor migration issue, the tsc can jump arbitrarily as a result of suspend/resume (ie, it can be non-monotonic). > And there are notable examples > of systems where the properties do NOT apply. > So it is true that an app that > does not know conclusively that certain threads > are running on certain processors cannot > always safely use rdtsc to obtain the > single-processor-guaranteed results. > > BUT some software systems (including VMware) do > provide this guarantee across multiple processors. > And recent families of both Intel and AMD > multi-core have advanced to the point where > the properties apply across all cores, so > on the vast majority (but admittedly not all) > of future physical systems, apps can and will > use rdtsc and expect the properties to apply > (whether guaranteed or not). > Even very recent processors with "constant" tscs (ie, they don't change rate with the core frequency) stop in certain power states. Any motherboard design which runs packages in different clock-domains will lose tsc-sync between those packages, regardless of what's in the packages. The "sane tsc" properties are primarily for the benefit of kernels, to allow them to make better use of the tsc. They will have enough knowledge of the overall system architecture to know how and when the tsc can be trusted. Usermode apps can try to piggyback onto this if they like, but they're in much more treacherous territory. They can never know what the underlying system design is, or whether its really safe to trust the tsc's sanity. And without some explicit guarantees on Linux's part, the tsc will still be non-monotonic over suspend/resume (in all its many forms). > So in your opinion, some systems are broken > so Xen should assume all future systems are > broken. In my opinion, the problem is being > fixed in hardware and has always been fixed > in VMware, so Xen should look to the future > not the past. > > Does that sound like a good summary of this > disagreement? > > Not quite. You are talking about three different cases: 1. the reliability of the tsc in a PV guest in kernel mode 2. the reliability of the tsc in a PV guest in user mode 3. the reliability of the tsc in an HVM guest I don't think 1. needs any attention. The current scheme works fine. The only option for 3 is to try make a best-effort of tsc quality, which ranges from trapping every rdtsc to make them all give globally monotonic results, or use the other VT/SVM features to apply an offset from the raw tsc to a guest tsc, etc. Either way the situation isn't much different from running native (ie, apps will see basically the same tsc behaviour as in the native case, to some degree of approximation). So, there's case 2: pv usermode. There are four classes of apps worth considering here: 1. Old apps which make unwarranted assumptions about the behavour of the tsc. They assume they're basically running on some equivalent of a P54, and so will get junk on any modernish system with SMP and/or power management. If people are still using such apps, it probably means their performance isn't critically dependent on the tsc. 2. More sophisticated apps which know the tsc has some limitations and try to mitigate them by filtering discontinuities, using rdtscp, etc. They're best-effort, but they inherently lack enough information to do a complete job (they have to guess at where power transitions occured, etc). 3. New apps which know about modern processor capabilities, and attempt to rely on constant_tsc forgoing all the best-effort filtering, etc 4. Apps which use gettimeofday() and/or clock_gettime() for all time measurement. They're guaranteed to get consistent time results, perhaps at the cost of a syscall. On systems which support it, they'll get vsyscall implementations which avoid the syscall while still using the best-possible clocksource. Even if they don't a syscall will outperform an emulated rdtsc. Class 1 apps are just broken. We can try to emulate a UP, no-PM processor for them, and that's probably best done in an HVM domain. There's no need to go to extraordinary efforts for them because the native hardware certainly won't. Class 2 apps will work as well as ever in a Xen PV domain as-is. If they use rdtscp then they will be able to correlate the tsc to the underlying pcpu and manage consistency that way. If they pin threads to VCPUs, then they may also requre VCPUs to be pinned to PCPUs. But there's no need to make deep changes to Xen's tsc handling to accommodate them. Class 3 apps will get a bit of a rude surprise in a PV Xen domain. But they're also new enough to use another mechanism to get time. They're new enough to "know" that gettimeofday can be very efficient, and should not be going down the rathole of using rdtsc directly. And unless they're going to be restricted to a very narrow class of machines (for example, not my relatively new Core2 laptop which stops the "constant" tsc in deep sleep modes), they need to fall back to being a class 2 or 4 app anyway. Class 4 apps are not well-served under Xen. I think the vsyscall mechanism will be disabled and they'll always end up doing a real syscall. However, I think it would be relatively easy to add a new vgettimeofday implementation which directly uses the pvclock mechanism from usermode (the same code would work equally well for Xen and KVM). There's no need to add a new usermode ABI to get quick, high-quality time in usermode. Performance-wise it would be more or less indistinguishable from using a raw rdtsc, but it has the benefit of getting full cooperation from the kernel and Xen, and can take into account all tsc variations (if any). So if you want to address these problems, it seems to me you'll get most bang for the buck by fixing (v)gettimeofday to use pvclock, and convincing app writers to trust in gettimeofday. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |