[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks
Hi Keir, Haitao, Eddie: Here is the patch for timekeeping. There are three components.1. Increase missed ticks threshold to 100 seconds. Per the comment in the patch, I have seen scheduling delays of 5 seconds in a test lasting 30 minutes. The result is an immediate error of 5 seconds in guest time that persists. 2. In pt_timer_fn for no_missed_tick_accounting option, only increment pt->pending_intr_nr if its found to be zero. This deals with the case where the guest has interrupts disabled for longer than a tick period. 3. Call pt_process_missed_ticks() unconditionally and place the test for no_missed_tick_accounting inside pt_process_missed_ticks(). This returns the calculation for the next timer expiration to the original method. The original method is important as it sets up a theoretical time space at t=0 where expirations occur only at n*period, where n is any integer. This, in turn, removes rounding errors. Accuracy tests: A. 32bit Linux guest (Item '1' above relevant) Test: 2 64bit Linux guests and 1 32bit Linux guest stacked. 8 vcpus for each guest. 2 physical processors on the node. All vcpus heavily loaded with work. Test duration: 3 hours. Result: clock error < .02%.(Before this patch guest time was 16 seconds slow in a 3.5 minute test.) B. 64bit Linux guest (Items '2' and '3' above relevant) Test: 2 64bit Linux guests. 8 vcpus for each guest. 2 physical processors on the node. All vcpus heavily loaded with work. Test duration: 3 hours. Result: clock error .02%. (Before this patch clock error was 1.3%. The contributions of items '2' and '3' above to the improvement in accuracy are .4% and .9% respectively. Of course, these tests are without ntpd running in the guest. Ntpd requires a local clock error of < .05%. With the .02% accuracy, one can used ntpd to synchronize the clock. thanks, Dave Keir Fraser wrote: I'm going to apply Haitao's no_missed_ticks_fix.patch. If you think fixes are needed apart from that, please provide a unified diff. -- Keir On 30/10/07 21:16, "Dave Winchell" <dwinchell@xxxxxxxxxxxxxxx> wrote:Keir, Here are my comments on your change. I've attached an updated vpt.c with the changes. 1. For the no_missed_tick_accounting method, we still need the update to pt->scheduled taking into account the time that has elapsed when missed ticks are calculated. missed_ticks (pt_process_missed_ticks) is called from pt_restore_timer() and pt_timer_fn() and, thus, its easiest to put the check for no_missed_tick_accounting method in missed_ticks() itself. 2. In pt_timer_fn you don't want to increment pending_intr_nr beyond 1 for no_missed_tick_accounting option. thanks, Dave Keir Fraser wrote:Applied as c/s 16274. Please take a look and make sure the mode works as you expect. -- Keir On 30/10/07 14:28, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote: Hi, Keir, This patch adds a new timer mode, in which no missed ticks is calculated. This can be used with latest x86_64 linux guest, since it can pick up missed ticks themselves. <<no_missed_ticks.patch>> Best Regards Haitao Shan ------------------------------------------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel*** vpt.c.new.c 2007-10-30 16:45:26.000000000 -0400 --- vpt.c 2007-10-30 15:30:57.000000000 -0400 *************** *** 57,73 **** return;missed_ticks = missed_ticks / (s_time_t) pt->period + 1; ! ! if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) {! if ( missed_ticks > 1000 ) ! { ! /* TODO: Adjust guest time together */ ! pt->pending_intr_nr++; ! } ! else ! { ! pt->pending_intr_nr += missed_ticks; ! } }pt->scheduled += missed_ticks * pt->period;--- 57,70 ---- return;missed_ticks = missed_ticks / (s_time_t) pt->period + 1;! if ( missed_ticks > 1000 ) ! { ! /* TODO: Adjust guest time together */ ! pt->pending_intr_nr++; ! } ! else ! { ! pt->pending_intr_nr += missed_ticks; }pt->scheduled += missed_ticks * pt->period;*************** *** 120,126 ****list_for_each_entry ( pt, head, list ){ ! pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); }--- 117,124 ---- list_for_each_entry ( pt, head, list ){ ! if ( !mode_is(v->domain, no_missed_tick_accounting) ) ! pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); }****************** 135,151 ****pt_lock(pt); ! if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) {! if(!pt->pending_intr_nr) ! pt->pending_intr_nr++; ! } ! else ! pt->pending_intr_nr++;if ( !pt->one_shot ){ pt->scheduled += pt->period; ! pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); }--- 133,152 ---- pt_lock(pt); ! pt->pending_intr_nr++; if ( !pt->one_shot ){ pt->scheduled += pt->period; ! if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) ! { ! pt_process_missed_ticks(pt); ! } ! else if ( (NOW() - pt->scheduled) >= 0 ) ! { ! pt->pending_intr_nr++; ! pt->scheduled = NOW() + pt->period; ! } set_timer(&pt->timer, pt->scheduled); }/** vpt.c: Virtual Platform Timer * * Copyright (c) 2006, Xiaowei Yang, Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, * version 2, as published by the Free Software Foundation. * * This program is distributed in the hope it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for * more details. * * You should have received a copy of the GNU General Public License along with * this program; if not, write to the Free Software Foundation, Inc., 59 Temple * Place - Suite 330, Boston, MA 02111-1307 USA. * */ #include <xen/time.h> #include <asm/hvm/support.h> #include <asm/hvm/vpt.h> #include <asm/event.h> #define mode_is(d, name) \ ((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name) static void pt_lock(struct periodic_time *pt) { struct vcpu *v; for ( ; ; ) { v = pt->vcpu; spin_lock(&v->arch.hvm_vcpu.tm_lock); if ( likely(pt->vcpu == v) ) break; spin_unlock(&v->arch.hvm_vcpu.tm_lock); } } static void pt_unlock(struct periodic_time *pt) { spin_unlock(&pt->vcpu->arch.hvm_vcpu.tm_lock); } static void pt_process_missed_ticks(struct periodic_time *pt) { s_time_t missed_ticks; if ( pt->one_shot ) return; missed_ticks = NOW() - pt->scheduled; if ( missed_ticks <= 0 ) return; missed_ticks = missed_ticks / (s_time_t) pt->period + 1; if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) { if ( missed_ticks > 1000 ) { /* TODO: Adjust guest time together */ pt->pending_intr_nr++; } else { pt->pending_intr_nr += missed_ticks; } } pt->scheduled += missed_ticks * pt->period; } static void pt_freeze_time(struct vcpu *v) { if ( !mode_is(v->domain, delay_for_missed_ticks) ) return; v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v); } static void pt_thaw_time(struct vcpu *v) { if ( !mode_is(v->domain, delay_for_missed_ticks) ) return; if ( v->arch.hvm_vcpu.guest_time == 0 ) return; hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time); v->arch.hvm_vcpu.guest_time = 0; } void pt_save_timer(struct vcpu *v) { struct list_head *head = &v->arch.hvm_vcpu.tm_list; struct periodic_time *pt; if ( test_bit(_VPF_blocked, &v->pause_flags) ) return; spin_lock(&v->arch.hvm_vcpu.tm_lock); list_for_each_entry ( pt, head, list ) stop_timer(&pt->timer); pt_freeze_time(v); spin_unlock(&v->arch.hvm_vcpu.tm_lock); } void pt_restore_timer(struct vcpu *v) { struct list_head *head = &v->arch.hvm_vcpu.tm_list; struct periodic_time *pt; spin_lock(&v->arch.hvm_vcpu.tm_lock); list_for_each_entry ( pt, head, list ) { pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); } pt_thaw_time(v); spin_unlock(&v->arch.hvm_vcpu.tm_lock); } static void pt_timer_fn(void *data) { struct periodic_time *pt = data; pt_lock(pt); if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) { if(!pt->pending_intr_nr) pt->pending_intr_nr++; } else pt->pending_intr_nr++; if ( !pt->one_shot ) { pt->scheduled += pt->period; pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); } vcpu_kick(pt->vcpu); pt_unlock(pt); } void pt_update_irq(struct vcpu *v) { struct list_head *head = &v->arch.hvm_vcpu.tm_list; struct periodic_time *pt; uint64_t max_lag = -1ULL; int irq = -1; spin_lock(&v->arch.hvm_vcpu.tm_lock); list_for_each_entry ( pt, head, list ) { if ( !is_isa_irq_masked(v, pt->irq) && pt->pending_intr_nr && ((pt->last_plt_gtime + pt->period_cycles) < max_lag) ) { max_lag = pt->last_plt_gtime + pt->period_cycles; irq = pt->irq; } } spin_unlock(&v->arch.hvm_vcpu.tm_lock); if ( is_lvtt(v, irq) ) { vlapic_set_irq(vcpu_vlapic(v), irq, 0); } else if ( irq >= 0 ) { hvm_isa_irq_deassert(v->domain, irq); hvm_isa_irq_assert(v->domain, irq); } } static struct periodic_time *is_pt_irq( struct vcpu *v, struct hvm_intack intack) { struct list_head *head = &v->arch.hvm_vcpu.tm_list; struct periodic_time *pt; struct RTCState *rtc = &v->domain->arch.hvm_domain.pl_time.vrtc; int vector; list_for_each_entry ( pt, head, list ) { if ( !pt->pending_intr_nr ) continue; if ( is_lvtt(v, pt->irq) ) { if ( pt->irq != intack.vector ) continue; return pt; } vector = get_isa_irq_vector(v, pt->irq, intack.source); /* RTC irq need special care */ if ( (intack.vector != vector) || ((pt->irq == 8) && !is_rtc_periodic_irq(rtc)) ) continue; return pt; } return NULL; } void pt_intr_post(struct vcpu *v, struct hvm_intack intack) { struct periodic_time *pt; time_cb *cb; void *cb_priv; spin_lock(&v->arch.hvm_vcpu.tm_lock); pt = is_pt_irq(v, intack); if ( pt == NULL ) { spin_unlock(&v->arch.hvm_vcpu.tm_lock); return; } if ( pt->one_shot ) { pt->enabled = 0; list_del(&pt->list); } else { pt->pending_intr_nr--; if ( mode_is(v->domain, no_missed_tick_accounting) ) pt->last_plt_gtime = hvm_get_guest_time(v); else pt->last_plt_gtime += pt->period_cycles; } if ( mode_is(v->domain, delay_for_missed_ticks) && (hvm_get_guest_time(v) < pt->last_plt_gtime) ) hvm_set_guest_time(v, pt->last_plt_gtime); cb = pt->cb; cb_priv = pt->priv; spin_unlock(&v->arch.hvm_vcpu.tm_lock); if ( cb != NULL ) cb(v, cb_priv); } void pt_reset(struct vcpu *v) { struct list_head *head = &v->arch.hvm_vcpu.tm_list; struct periodic_time *pt; spin_lock(&v->arch.hvm_vcpu.tm_lock); list_for_each_entry ( pt, head, list ) { pt->pending_intr_nr = 0; pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu); pt->scheduled = NOW() + pt->period; set_timer(&pt->timer, pt->scheduled); } spin_unlock(&v->arch.hvm_vcpu.tm_lock); } void pt_migrate(struct vcpu *v) { struct list_head *head = &v->arch.hvm_vcpu.tm_list; struct periodic_time *pt; spin_lock(&v->arch.hvm_vcpu.tm_lock); list_for_each_entry ( pt, head, list ) migrate_timer(&pt->timer, v->processor); spin_unlock(&v->arch.hvm_vcpu.tm_lock); } void create_periodic_time( struct vcpu *v, struct periodic_time *pt, uint64_t period, uint8_t irq, char one_shot, time_cb *cb, void *data) { destroy_periodic_time(pt); spin_lock(&v->arch.hvm_vcpu.tm_lock); pt->enabled = 1; pt->pending_intr_nr = 0; /* Periodic timer must be at least 0.9ms. */ if ( (period < 900000) && !one_shot ) { gdprintk(XENLOG_WARNING, "HVM_PlatformTime: program too small period %"PRIu64"\n", period); period = 900000; } pt->period = period; pt->vcpu = v; pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu); pt->irq = irq; pt->period_cycles = (u64)period * cpu_khz / 1000000L; pt->one_shot = one_shot; pt->scheduled = NOW() + period; /* * Offset LAPIC ticks from other timer ticks. Otherwise guests which use * LAPIC ticks for process accounting can see long sequences of process * ticks incorrectly accounted to interrupt processing. */ if ( is_lvtt(v, irq) ) pt->scheduled += period >> 1; pt->cb = cb; pt->priv = data; list_add(&pt->list, &v->arch.hvm_vcpu.tm_list); init_timer(&pt->timer, pt_timer_fn, pt, v->processor); set_timer(&pt->timer, pt->scheduled); spin_unlock(&v->arch.hvm_vcpu.tm_lock); } void destroy_periodic_time(struct periodic_time *pt) { if ( !pt->enabled ) return; pt_lock(pt); pt->enabled = 0; list_del(&pt->list); pt_unlock(pt); /* * pt_timer_fn() can run until this kill_timer() returns. We must do this * outside pt_lock() otherwise we can deadlock with pt_timer_fn(). */ kill_timer(&pt->timer); } diff -r 2717128cbdd1 xen/arch/x86/hvm/vpt.c --- a/xen/arch/x86/hvm/vpt.c Wed Oct 31 10:07:42 2007 +0000 +++ b/xen/arch/x86/hvm/vpt.c Thu Nov 01 16:19:06 2007 -0400 @@ -57,14 +57,23 @@ static void pt_process_missed_ticks(stru return; missed_ticks = missed_ticks / (s_time_t) pt->period + 1; - if ( missed_ticks > 1000 ) - { - /* TODO: Adjust guest time together */ - pt->pending_intr_nr++; - } - else - { - pt->pending_intr_nr += missed_ticks; + + if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) { + /* When many guests are stacked on a node and the total number of vcpus + * running loads is 10 timed the number of physical processors, + * scheduling delays of 5 seconds are common. So set the threshold at + * 100 seconds to be safe. It would be reasonable to remove this + * check against threshold completely. + */ + if ( missed_ticks > 100000 ) + { + /* TODO: Adjust guest time together */ + pt->pending_intr_nr++; + } + else + { + pt->pending_intr_nr += missed_ticks; + } } pt->scheduled += missed_ticks * pt->period; @@ -117,15 +126,7 @@ void pt_restore_timer(struct vcpu *v) list_for_each_entry ( pt, head, list ) { - if ( !mode_is(v->domain, no_missed_tick_accounting) ) - { - pt_process_missed_ticks(pt); - } - else if ( (NOW() - pt->scheduled) >= 0 ) - { - pt->pending_intr_nr++; - pt->scheduled = NOW() + pt->period; - } + pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); } @@ -140,13 +141,14 @@ static void pt_timer_fn(void *data) pt_lock(pt); - pt->pending_intr_nr++; + if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) || !pt->pending_intr_nr ) { + pt->pending_intr_nr++; + } if ( !pt->one_shot ) { pt->scheduled += pt->period; - if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) - pt_process_missed_ticks(pt); + pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |