[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description



Hi Dan,

This isn't the cycles of a single switch. This is the total cycle count (added) 
over a period. I randomly dumped the numbers when a guest was running.

Thanks,
-Wei

-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Dan Magenheimer
Sent: Friday, April 15, 2011 3:16 PM
To: Huang2, Wei; Keir Fraser
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description

Wait... a context switch takes over 4 billion cycles?
Not likely!

And please check your division.  I get the same
answer from "dc" only when I use lowercase hex
numbers and dc complains about unimplemented chars,
else I get 0.033%... also unlikely.

> -----Original Message-----
> From: Wei Huang [mailto:wei.huang2@xxxxxxx]
> Sent: Thursday, April 14, 2011 4:57 PM
> To: Keir Fraser
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
> 
> Hi Keir,
> 
> I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
> __fpu_unlazy_restore(), which are used to save/restore LWP state. Here
> are the results:
> 
> (1) tsc_total: total time used for context_switch() in x86/domain.c
> (2) tsc_unlazy: total time used for __fpu_unlazy_save() +
> __fpu_unlazy_retore()
> 
> One example:
> (XEN) tsc_unlazy=0x00000000008ae174
> (XEN) tsc_total=0x00000001028b4907
> 
> So the overhead is about 0.2% of total time used by context_switch().
> Of
> course, this is just one example. I would say the overhead ratio would
> be <1% for most cases.
> 
> Thanks,
> -Wei
> 
> 
> 
> On 04/14/2011 04:09 PM, Keir Fraser wrote:
> > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@xxxxxxx>  wrote:
> >
> >> The following patches support AMD lightweight profiling.
> >>
> >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
> >> handle lazy and unlazy FPU states differently. Lazy FPU state (such
> as
> >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as
> LWP,
> >> is saved and restored on each vcpu context switch. To simplify the
> code,
> >> we also add a mask option to xsave/xrstor function.
> > How much cost is added to context switch paths in the (overwhelmingly
> > likely) case that LWP is not being used by the guest? Is this adding
> a whole
> > lot of unconditional overhead for a feature that noone uses?
> >
> >   -- Keir
> >
> >> Thanks,
> >> -Wei
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> >
> >
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.