[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] replace rdtsc emulation-vs-native xen boot option with per-domain (hypervisor part)



Let me attempt to summarize our disagreement and then
I'd like to stop arguing.

1) You think rdtsc is never safe for an app to use.  I
   think it is safe in many hardware/software enviroments
   and that the number of safe environments will continue
   to increase.
2) You think the performance hit from rdtsc-emulation
   is horrible.  I think it is significant but relatively
   small and acceptable and, if there are cases where it
   is not, administrators or virtual appliance providers
   can make an informed choice to turn it off.
3) You think app developers can and should be told to
   not use rdtsc because it is inherently unsafe and
   so Xen doesn't need to ever be concerned with making it
   safe.  I think app developers will do what they
   please, ignorant to the subtleties of rdtsc, and if
   their app works on their hardware and on VMware but
   not on Xen, they will blame Xen or Linux or their
   OS provider or their cloud provider, and probably never
   know that their app doesn't work because of rdtsc.

Do you agree that those are the key points of disagreement?

Thanks,
Dan

> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@xxxxxxxx]
> Sent: Tuesday, September 29, 2009 1:02 PM
> To: Dan Magenheimer
> Cc: Xen-Devel (E-mail); Keir Fraser
> Subject: Re: [Xen-devel] [PATCH] replace rdtsc emulation-vs-native xen
> boot option with per-domain (hypervisor part)
> 
> 
> On 09/29/09 10:34, Dan Magenheimer wrote:
> >> The TSC is not, and has never been reliable.
> >>     
> > Your data is stale.  Please discuss this with processor
> > and system vendors (I have)
> 
> I'm sure they would say that, as they frequently have in the 
> past.  And
> then it breaks again.
> 
> Even then their guarantee only applies while the processor is 
> powered up
> and hasn't been reset.  But resets can occur while the system is
> "running" in the form of S3 suspend events, or even completely powered
> off when suspending to disk.
> 
> Besides, the SDM makes no claims about tsc synchronization 
> between CPUs,
> only that on a given CPU/core is at a constant rate (at least from now
> on, promise!).  At that point you're relying on motherboard/system
> design, which has a lot more scope for brokenness than just 
> core CPUs. 
> Large systems simply don't keep all their CPUs in the same 
> clock domain,
> and certainly won't guarantee that for all future system designs.
> 
> >  and look at the latest upstream Linux.
> >   
> The kernel does what it needs to do to make the tsc usable 
> for itself. 
> It does not make (and has never made) any guarantees about how the tsc
> appears in usermode (except for the purposes of implementing
> vgettimeofday).  You won't find many Linux kernel developers who are
> sympathetic with the idea of making any hard guarantees for bare
> usermode tsc use.
> 
> >> Except that it comes with a terrible cost...
> >> This is a massive regression...
> >>     
> > It is certainly significant but "terrible" and "massive"
> > are a bit strong.  Based on my measurements, the examples
> > you cite will degrade performance by a fraction of a percent.
> >   
> How have you measured this?  On what systems?  Your patch introduces
> this regression on all systems for everyone; it isn't enough 
> to measure
> it on a new Nehalem machine.
> 
> >> The fact that you haven't named a single real app...
> >> Are you really arguing on the basis that "some apps
> >> might use tsc in a fragile way" or do you actually have a 
> >> specific list
> >>     
> > I have a (small) specific list.  For various reasons,
> > I cannot go into further detail.
> >   
> 
> Well, that goes back to my point about spending a lot of effort on
> something that can only possibility benefit a (small) set of niche
> apps.  Spending the effort on a vsyscall approach would be 
> fast, correct
> and widely beneficial.
> 
> You can default it on within Oracle, or even in Oracle's Xen distro. 
> It's unreasonable to make this a global default when you're trying to
> solve a local problem.  You haven't established this is something that
> anyone else need be concerned about.
> 
> Besides, if they want a global sequence number, why not just keep a
> global counter?  That's going to be much cheaper and more 
> reliable than
> anything time-based.
> 
>     J
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.