[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: rdtsc: correctness vs performance on Xen (and KVM?)
On 08/31/09 16:52, Dan Magenheimer wrote: > work both on Xen and bare metal, and works properly > across: vcpu-to-pcpu rescheduling even on NUMA > machines; system sleep/hibernation; and > save/restore/migration between machines with > dissimilar clock rates. But it will only do this when running under Xen. If running on bare metal, there will be nothing providing the correction info to the app, and it will be no better than using raw rdtsc with all its limitations. In practice this means that the app will have to have some other code path anyway. > Implementation requires > changes in Xen and "the app" but no OS changes > thus making it still viable on legacy OS's > and possibly(?) HVM domains. Note that > only apps that need to sample time on the > order of >5-100K/core/second would use this; > for other apps, rdtsc emulation overhead > is probably negligible (<0.2%). > > 0) Xen implements rdtsc emulation by default > 1) Guest OS is launched with pvtsc=1 in vm.cfg > 2) App running on guest OS sets up a SIGILL handler > 3) App executes a special rdmsr instruction or > hypercall. > No way to do direct hypercalls from usermode, so it would need to be an illegal instruction (like cpuid). But really it should be a system-wide kernel setting, set via sysctl or something. > 4a) If SIGILL results, not running on Xen at all, > or on old Xen; app uses rdtsc at own risk. Done. > 4b) Else, rdmsr/hypercall returns virtual address of > special pvclock page ("pvclock_va"). > This can't be done without changing the kernel; Xen can't just start sticking stuff into usermode mappings (how does Xen even know where a given OS's usermode is?). And again, usermode can't do hypercalls and I don't think we should start making fake rdmsrs start working in usermode. > 5) App executes another special rdmsr instruction/ > hypercall to disable rdtsc emulation. This > affects ALL execution for all processes in this VM. > Once enabled, it should just stay enabled. System-wide is very coarse anyway (since there's no guarantee that all apps will use the mechanism). > 6) Xen maintains mapping of pvclock_va to a > different physical page for each processor > and transparently handles TLB misses for > pvclock_va > If you mean that a given VA has a per-cpu mapping, it requires percpu pagetables. That's not possible in Linux with PV pagetables (since two tasks/threads on different cpus sharing the same mm will use the same pagetable). > 7) App uses (unemulated) rdtsc and applies > pvclock algorithm (using values in memory > at pvclock_va) resulting in pvtsc, which > is nanoseconds since VM start. App can > further apply local algorithms to enforce > monotonicity or frequency scaling as desired. > > Comments appreciated. I realize that this is hacky > and ugly... better alternatives gladly solicited. > In general even Linux's specialised APIs are entirely unused (sendfile, vmsplice, etc). Something as esoteric as this will be pretty much unused. This can be entirely done within the vsyscall mechansim without any app changes. There's no reason no to. > P.S. While it would be nice if we could just tell > apps to use a fast vgettimeofday equivalent, this > does not exist today and, even if it did, would not > be widely available for years in the kernel running under > most enterprise app deployments (and, even then, > only on 64-bit Linux.) > These rationales are very unconvincing: Making vsyscall work on 32bit is just a matter of doing it; apparently nobody has put the effort into it, but there's no fundimental reason why it wouldn't work. Besides, who runs enterprise apps on 32-bit these days? Anything requiring even moderate amounts of memory is better run on 64-bit. Your mechanism will require kernel changes anyway, so there's no getting around that. Once vsyscall does Xen/KVM properly, then every app will automatically do the right thing without modification. There's no need for specialized APIs that nobody will end up using anyway. It only makes sense to go to this kind of effort if it ends up making a plain "rdtsc" have the properties you want it to have. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |