# HG changeset patch # User David Scott # Date 1268839981 0 # Node ID 178a3a5bd3122aa48f78a4636c99b1b8792d74bf # Parent 5b343db0055dda379733d518826b36465114e34b CA-38030: improve the RRD behaviour when the system clock moves backwards (earlier) Moving the clock backwards (earlier) is a bad thing to do: it will cause many time-related functions to fail. Most things can be cleared with a reboot... except RRDs which are persistent. When the clock moves towards the future, lots of recent RRD data is lost as it is averaged and then shuffled into more approximate RRDs. When the clock moves towards the past, the previous behaviour is to reject old updates as invalid. This unfortunately means that fields driven from RRDs (eg memory-actual) freeze (often at 0). The new behaviour is to: * accept the update even though we know something is wrong * log a warning message * generate a single HOST_CLOCK_WENT_BACKWARDS ALERT. SIgned-off-by: David Scott diff -r 5b343db0055d -r 178a3a5bd312 ocaml/idl/api_messages.ml --- a/ocaml/idl/api_messages.ml Tue Mar 16 22:29:33 2010 +0000 +++ b/ocaml/idl/api_messages.ml Wed Mar 17 15:33:01 2010 +0000 @@ -63,6 +63,8 @@ let host_clock_skew_detected = addMessage "HOST_CLOCK_SKEW_DETECTED" let host_clock_skew_detected_priority = 10L +let host_clock_went_backwards = addMessage "HOST_CLOCK_WENT_BACKWARDS" +let host_clock_went_backwards_priority = 10L let pool_master_transition = addMessage "POOL_MASTER_TRANSITION" diff -r 5b343db0055d -r 178a3a5bd312 ocaml/xapi/monitor_rrds.ml --- a/ocaml/xapi/monitor_rrds.ml Tue Mar 16 22:29:33 2010 +0000 +++ b/ocaml/xapi/monitor_rrds.ml Wed Mar 17 15:33:01 2010 +0000 @@ -604,6 +604,7 @@ ignore(Unix.write s xml 0 (String.length xml)) end) +let sent_clock_went_backwards_alert = ref false (* Updates all of the hosts rrds. We are passed a list of uuids that * is used as the primary source for which VMs are resident on us. @@ -619,13 +620,20 @@ correctly represents the world *) let to_send_back = Mutex.execute mutex (fun () -> - let out_of_date = - match !host_rrd with - | None -> false - | Some rrdi -> rrdi.rrd.Rrd.last_updated > timestamp + let out_of_date, by_how_much = + match !host_rrd with + | None -> false, 0. + | Some rrdi -> rrdi.rrd.Rrd.last_updated > timestamp, abs_float (timestamp -. rrdi.rrd.Rrd.last_updated) in - if not out_of_date then begin + if out_of_date then begin + warn "Clock just went backwards by %.0f seconds: RRD data may now be unreliable" by_how_much; + if not(!sent_clock_went_backwards_alert) then begin + Xapi_alert.add ~name:Api_messages.host_clock_went_backwards ~priority:Api_messages.host_clock_went_backwards_priority + ~cls:`Host ~obj_uuid:(Xapi_inventory.lookup Xapi_inventory._installation_uuid) ~body:""; + sent_clock_went_backwards_alert := true; (* send at most one *) + end; + end; let registered = Hashtbl.fold (fun k _ acc -> k::acc) vm_rrds [] in let my_vms = uuids in @@ -759,10 +767,6 @@ Condition.broadcast condition; to_send_back - end else begin - debug "Ignoring out-of-date update"; - [] (* If out of date *) - end ) in diff -r 5b343db0055d -r 178a3a5bd312 ocaml/xapi/rrd.ml --- a/ocaml/xapi/rrd.ml Tue Mar 16 22:29:33 2010 +0000 +++ b/ocaml/xapi/rrd.ml Wed Mar 17 15:33:01 2010 +0000 @@ -238,7 +238,9 @@ let ds_update rrd timestamp values transforms = (* Interval is the time between this and the last update *) let interval = timestamp -. rrd.last_updated in - + (* Work around the clock going backwards *) + let interval = if interval < 0. then 5. else interval in + (* start time (st) and age of the last processed pdp and the currently occupied one *) let proc_pdp_st, proc_pdp_age = get_times rrd.last_updated rrd.timestep in let occu_pdp_st, occu_pdp_age = get_times timestamp rrd.timestep in