[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

Thanks Andrew and Ian for taking the time to look at this change.
In turn it took me some time to get back to this topic.

Am Mon, 1 Oct 2018 13:39:51 +0100
schrieb Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:

> On 07/06/18 14:08, Olaf Hering wrote:
> > Add an option to control when vTSC emulation will be activated for a
> > domU with tsc_mode=default. Without such option each TSC access from
> > domU will be emulated, which causes a significant perfomance drop for
> > workloads that make use of rdtsc.
> >
> > One option to avoid the TSC option is to run domUs with tsc_mode=native.
> > This has the drawback that migrating a domU from a "2.3GHz" class host
> > to a "2.4GHz" class host may change the rate at wich the TSC counter
> > increases, the domU may not be prepared for that.
> >
> > With the new option the host admin can decide how a domU should behave
> > when it is migrated across systems of the same class. Since there is
> > always some jitter when Xen calibrates the cpu_khz value, all hosts of
> > the same class will most likely have slightly different values. As a
> > result vTSC emulation is unavoidable. Data collected during the incident
> > which triggered this change showed a jitter of up to 200 KHz across
> > systems of the same class.  
> Do you have any further details of the systems involved?  If they are
> identical systems, they should all have the same real TSC frequency, and
> its a known issue that Xen isn't very good at working out the
> frequency.  TBH, fixing that would be far better overall.

My test hosts have a E5504 cpu. The ones where the issue was reported
use "E7-8880 v3" today, the used hardware two years ago was likely older.

From what I understand the TSC frequency stored in "cpu_khz" is just an
estimated value, not the real hardware frequency. Still, it is used to
decide if two hosts tick at the same speed. The domU kernel may use the
estimated value for its timekeeping, I think it does the same estimation
as Xen itself. But, I have to dig into that.

To me it looks like domUs should run ntpd themselves if there is a plan
to migrate them at some point in the future. At least if they use TSC
for timekeeping. With ntpd the domU would detect time skew, even if it
was not yet migrated to another host. I do not know much about timekeeping,
so this is just a guess on my side.

> > Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> > The padding is sent as zero in write_tsc_info to the receving host.
> > The padding is undefined if the changed code runs as receiver.  
> I'm not sure what you mean by this final sentence.

I have removed that part, since incoming padding is in practice always zero.

> > handle_tsc_info has no code to verify that padding is indeed zero. Due
> > to the lack of a version field it is impossible to know if the sender
> > already has the newly introduced vtsc_tolerance field. In the worst
> > case the receiving domU will get an unemulated TSC.  
> The lack of padding verification is deliberate, for forwards
> compatibility.  Why does the sending code matter?  One way or another,
> if the field is 0, the option wasn't present or wasn't configured. 
> Neither of these situations affect the decision-making that the
> receiving side needs to perform.

> > Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
> > Reviewed-by: Wei Liu <wei.liu2@xxxxxxxxxx> (v07/v08)
> > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> (v08)  
> I'm still -0.5 for this patch.  I can appreciate why you want it, but it
> is a gross hack which only works when you don't skew time more than NTP
> in the guest can cope with.  My gut feeling is that there will be other
> more subtle fallout.

I will do some research regarding how much skewing a domU can handle.
As said in another reply, the expected time drift with a difference of
just 11 kHz in the cpu_khz variable on a 2.6GHz system is about 0.3 seconds
during a day.

IanJ requested clarification for how much time skew a system can handle.
Perhaps this should have been part of the initial submission for
tsc_mode=native already. I will do some research. Also some of the concerns
about missing documentation are already covered in paragraph #3 of the
commit message.

I will do some more testing with staging and send v10 next week.
The accumulated changes for a v10, so far:
 - rebase to 402411ec40
 - update write_libxc_tsc_info to handle the new parameter vtsc_tolerance_khz
   without this change, migration from xen-4.5 will fail (Andrew)
 - add newline to tsc_set_info (Andrew)
 - add measurment unit to libxc-migration-stream.pandoc (Andrew)
 - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew)
 - reword the newly added paragraph in xen-tscmode(7) (Andrew),
   and also mention that it is about the measured/estimated TSC value
   rather than the real value. The latter is simply unknown.
 - simplify wording regarding the value of padding field in old Xen
   versions, the previous one turned out to be confusing and not helpful


Attachment: pgp6KUYnc_HUh.pgp
Description: Digitale Signatur von OpenPGP

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.