[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Massive Instant Clock Jump & Freeze domU Issue (NOT Related to Drift, Live Migration or Saving/Restoring)



On Mon, 2013-04-22 at 23:50 +0100, Tom W wrote:
> Hello Xen Developers! After fully researching ourselves and talking to
> many Xen consultants, we have been advised to inquire here about a
> rare Xen bug we are possibly experiencing. Any help or advice would be
> much appreciated, thanks in advance! We're also open to offering some
> financial support to solve this problem.

Does your hypervisor tree have this commit in it:

        commit 84628ee52a427b0f0fe50502eb8ffd0eedad0f03
        Author: Jan Beulich <jbeulich@xxxxxxxx>
        Date:   Mon Nov 26 17:20:39 2012 +0100
        
            x86/time: fix scale_delta() inline assembly

That was responsible for a rash of strange time jumps, although IIRC it
affected the whole system and not individual VMs.

It might be worth looking at the scale_delta function in your kernel,
which I think you will find in arch/i386/kernel/time-xen.c. There was a
fix made to this code in the upstream kernel which may be missing there:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=de2d1a524e94a79078d9fe22c57c0c6009237547

I have no idea if this fix is relevant to the kernel (or compiler etc)
you are using, but it looks interesting...

> Here is a summary of the problem:
> -very infrequently the domU clock is instantly jumping ahead a massive
> amount of time and then appearing to lock on the new time (i.e. time
> stops)

Some kernels (I expect including yours) contain a "latch" so that time
always appears monotonic, which means that if time glitches forwards and
then back again it will appear to lock at the later time. Look for
monotonic in arch/i386/kernel/time-xen.c for the code.

If you were able to add some debugging to the kernel you should be able
to observe this latching, in fact a single shot debug print when the
latched time is way ahead of the current time would be a useful
diagnostic tool IMHO.

> -for all dom0s & domUs: independent_wallclock=0, ntpd is running,
> clocksource=jiffies, Xen version 3.1.2

I know there is a lot of suggestions to set clocksource=jiffies floating
around on the Internet but I am far from convinced that it is a good
idea.

I won't rule out it being a useful workaround for kernel+hypervisors of
the vintage you are using, but I think it would be interesting to try
without it.

You've already noticed that independent_wallclock=0 and ntpd are
inconsistent, so that's good.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.