[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Xen-devel] when timer go back in dom0 save and restore or migrate, PV domain hung
hrtimer supports two timer bases: CLOCK_MONOTONIC and
CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for latter
instead TOD is used directly per my reading. I did a quick search, and it looks
that futex and ntp are using CLOCK_REALTIME. Also there's one vsyscall gate
which can pass CLOCK_REALTIME from caller too.
Thanks,
Kevin
hrtimers add wall_to_monotonic to xtime to get a
timesource that doesn't (or shouldn't!) warp.
-- Keir
On
26/11/08 14:20, "Tian, Kevin" <kevin.tian@xxxxxxxxx>
wrote:
how about hrtimers? one mode is CLOCK_REALTIME, which uses
getnstimeofday as expiration. Once system time is changed either in local or
new machine, that expiration can't be adjusted. but i'm not sure whether it
still makes sense to try hrtimers in a guest.
Thanks Kevin
From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
Sent: Wednesday, November 26, 2008 10:11 PM To:
Tian, Kevin; 'James Song';
xen-devel@xxxxxxxxxxxxxxxxxxx Subject: Re: [Xen-devel]
when timer go back in dom0 save and restore or migrate, PV domain
hung
The
problem hasn't been fully explained, but I can say that PV guests
expect system time to jump across s/r and deal with that. For
example, Linux doesn't use Xen system time internally, but uses its
progress to periodically update jiffies, which does not warp across
s/r.
We have had problems corrupting wc_sec/wc_nsec in
xc_domain_restore.c, but that was fixed some time
ago.
-- Keir
On 26/11/08 14:00, "Tian, Kevin"
<kevin.tian@xxxxxxxxx> wrote:
This is not a s/r or lm specific issue. For example, system
time can be changed even when pv guest is running. Your patch only
hacks restore point once, and wc_sec can still be changed later
when system time is changed on-the-fly
again.
IIRC, pv guest can catch up wall clock change in timer
interrupt, and time_resume will sync internal processed system
time with new system time after restored. But I'm not sure whether
it's enough. Actually the more interesting is the uptime
difference. For example, timer with expiration calculated on
previous system time may wait nearly infinite if uptime among two
boxes vary a lot. But I think such issue should have been considered
already, e.g. some user tool assistance. I think Keir can comment
better here.
BTW, do you happen to know what exactly dom0 hangs on? In
some busy loop to catch up time, or long delay to some critical
timer expiration?
Thanks, Kevin
From:
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]
On Behalf Of James Song Sent: Tuesday,
November 25, 2008 4:02 PM To:
xen-devel@xxxxxxxxxxxxxxxxxxx Subject:
[Xen-devel] when timer go back in dom0 save and restore or
migrate, PV domain hung
Hi, I
find PV domin hung, When we take those steps
1,
save PV domain
2,
change system time of PV domain back
3,
restore a PV domain
or
1,
migrate a PV domain from Machine A to Machine
B 2,
the system time of Machine B is slower than Machine
A. the problem is wc_sec will be
change when system-time chanaged in dom0 or restore in a
slower-system-time machine, but when restoring, xen don't
restore the wc_sec of share_info from xenstore and use native
one. So guest os will hang. this patch will work for
this issue.
Thanks -- Song
Wei
diff -r a5ed0dbc829f
tools/libxc/xc_domain_restore.c ---
a/tools/libxc/xc_domain_restore.c Tue
Nov 18 14:34:14 2008 +0800 +++
b/tools/libxc/xc_domain_restore.c Fri Nov 21
17:34:15 2008 +0800 @@ -328,6 +328,16
@@ /* For info
only */ nr_pfns = 0; +
//jsong@xxxxxxxxxx, james song +
memset(&domctl, 0,
sizeof(domctl)); + domctl.domain =
dom; + domctl.cmd =
XEN_DOMCTL_restoredomain; + frc =
do_domctl(xc_handle, &domctl); +
if ( frc != 0 ) +
{ +
ERROR("Unable
to set flag of restore."); +
goto
out; +
} if
( read_exact(io_fd, &p2m_size, sizeof(unsigned long))
) { @@ -1120,6 +1130,8
@@ /* restore
saved vcpu_info and arch specific info
*/ MEMCPY_FIELD(new_shared_info,
old_shared_info, vcpu_info); +
MEMCPY_FIELD(new_shared_info,
old_shared_info, wc_nsec); +
MEMCPY_FIELD(new_shared_info,
old_shared_info,
wc_sec); MEMCPY_FIELD(new_shared_info,
old_shared_info,
arch); /* clear
any pending events and the selector */ diff -r
a5ed0dbc829f xen/arch/x86/time.c ---
a/xen/arch/x86/time.c Tue Nov 18
14:34:14 2008 +0800 +++ b/xen/arch/x86/time.c
Fri Nov 21 17:34:15 2008 +0800 @@
-689,7 +689,6
@@ wmb(); (*version)++; } - void
update_vcpu_system_time(struct vcpu
*v) { struct
cpu_time *t; @@ -703,7
+702,6 @@ if (
u->tsc_timestamp == t->local_tsc_stamp
) return; - version_update_begin(&u->version); u->tsc_timestamp
= t->local_tsc_stamp; @@
-713,14 +711,19
@@ version_update_end(&u->version); } - void
update_domain_wallclock_time(struct domain
*d) { spin_lock(&wc_lock); +
if(d->after_restore ) +
{ +
d->after_restore
= 0; + goto
out; //jsong@xxxxxxxxxx +
} version_update_begin(&shared_info(d,
wc_version)); shared_info(d,
wc_sec) = wc_sec +
d->time_offset_seconds; shared_info(d,
wc_nsec) =
wc_nsec; version_update_end(&shared_info(d,
wc_version)); +out: spin_unlock(&wc_lock); } @@
-751,7 +754,6 @@ u64
x; u32 y, _wc_sec,
_wc_nsec; struct domain
*d; - x = (secs *
1000000000ULL) + (u64)nsecs -
system_time_base; y =
do_div(x, 1000000000); @@ -1050,7 +1052,6
@@ struct tm
wallclock_time(void) { uint64_t
seconds; - if (
!wc_sec
) return
(struct tm) { 0 }; diff -r a5ed0dbc829f
xen/common/domctl.c --- a/xen/common/domctl.c
Tue Nov 18 14:34:14 2008 +0800 +++
b/xen/common/domctl.c Fri Nov 21
17:34:15 2008 +0800 @@ -24,7 +24,6 @@ #include
<asm/current.h> #include
<public/domctl.h> #include
<xsm/xsm.h> - extern long
arch_do_domctl( struct
xen_domctl *op, XEN_GUEST_HANDLE(xen_domctl_t)
u_domctl); @@ -315,6 +314,16
@@ ret
=
0; } break; +
case XEN_DOMCTL_restoredomain: +
{ +
struct domain
*d; + if ( (d
= rcu_lock_domain_by_id(op->domain)) == NULL
) +
break; +
+
d->after_restore =
1; +
rcu_unlock_domain(d); +
break; +
} case
XEN_DOMCTL_createdomain: { diff
-r a5ed0dbc829f xen/include/public/domctl.h ---
a/xen/include/public/domctl.h Tue Nov 18
14:34:14 2008 +0800 +++ b/xen/include/public/domctl.h
Fri Nov 21 17:34:15 2008 +0800 @@
-61,6 +61,7 @@ #define XEN_DOMCTL_destroydomain
2 #define
XEN_DOMCTL_pausedomain
3 #define
XEN_DOMCTL_unpausedomain
4 +#define
XEN_DOMCTL_restoredomain
51 #define
XEN_DOMCTL_resumedomain
27 #define
XEN_DOMCTL_getdomaininfo
5 diff -r a5ed0dbc829f
xen/include/xen/sched.h --- a/xen/include/xen/sched.h
Tue Nov 18 14:34:14 2008 +0800 +++
b/xen/include/xen/sched.h Fri Nov 21 17:34:15
2008 +0800 @@ -231,6 +231,7
@@ * cause a
deadlock. Acquirers don't spin waiting; they
preempt. */ spinlock_t
hypercall_deadlock_mutex; + int
after_restore;
//jsong@xxxxxxxxxx }; struct
domain_setup_info --------------------------------------------------------------------------------------------- Thanks --Song
wei
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|