[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: Large system boot problems

Keir Fraser wrote:
> On 8/2/08 15:22, "Bill Burns" <bburns@xxxxxxxxxx> wrote:
>>> But ultimately the calibration code should be robust to long delays before
>>> it is executed. It shouldn't go haywire. So something is bad there. Do you
>>> have a dump of the decision made by the calibration code on cpu0 the very
>>> first time it actually gets invoked? We probably need to trace the hell out
>>> of that first invocation to work out why it gets things so badly wrong.
>> I don't have more than in the earlier email where is shows the
>> large delta in tsc time, which seems to cause the bogus result.
> Okay, well looking at the inputs on that first invocation -- master_stime
> and local_stime -- they are totally out of sync. One says that 9.3s has
> elapsed since init_xen_time() was invoked, the other says that 4.6s has
> elapsed (curiously exactly half the time). The former is correct if the CPU
> really is a 3.4GHz part and is running at full speed for the duration. But
> you ought to be able to work out which is the correct ballpark by timing
> with a stopwatch the time between init_xen_time() and that first invocation
> on cpu0 of local_time_calibration() (you'll have to printk() when
> init_xen_time() is executed).
>  -- Keir

Well, I have a proposed fix that fixes the major symptom
of dom0 reporting time going backwards and failing it initialize
properly. I must note that dom0 still reports the wrong speed for
CPU0 when only one iteration of local_time_calibration occurs
before dom0 gets going. I believe that second issue is probably
due to the large delta between the master and local stime.

The first call to local_time_calibration automatically fixes
local stime being behind.

But when a significant amount of time has elapsed before the
initial call to local_time_calibration the code that deals with
the local stime and tsc deltas is broken. When the 64 bit deltas
for local stime is manipulated down to a 32 bit value the
tsc delta is also adjusted, but the tsc_shift value is
not maintained.

There are two loops. The first shifts both the stime and
tsc vaules in sync but fails to record the tsc shift:

    while ( ((u32)stime_elapsed64 != stime_elapsed64) ||
            ((s32)stime_elapsed64 < 0) )
        stime_elapsed64 >>= 1;
        tsc_elapsed64   >>= 1;
++      tsc_shift--;

The second does the tsc shift alone, which is fine, but note
that it does record the tsc shift.

    /* tsc_elapsed <= 2*stime_elapsed */
    while ( tsc_elapsed64 > (stime_elapsed32 * 2) )
        tsc_elapsed64 >>= 1;

Making this one line change, as in the attached patch
yields a properly working dom0. Tested on both a small
memory and large memory system.

--- arch/x86/time.c.orig        2008-02-12 07:16:48.000000000 -0500
+++ arch/x86/time.c     2008-02-12 11:19:47.000000000 -0500
@@ -857,6 +857,7 @@ static void local_time_calibration(void 
         stime_elapsed64 >>= 1;
         tsc_elapsed64   >>= 1;
+        tsc_shift--;
     /* stime_master_diff now fits in a 32-bit word. */
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.