[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] suspend evetchn creation failure

On Tue, Mar 11, 2014 at 10:39 AM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
On Tue, Mar 11, 2014 at 09:39:21AM -0700, Shriram Rajagopalan wrote:
> On Tue, Mar 11, 2014 at 4:21 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx>wrote:
> > On Mon, 2014-03-03 at 12:24 -0500, Prateek Sharma wrote:
> > > Hi all,
> > >       During xm save, I xc_save throws up this error :
> > > "failed to get the suspend evtchn port".
> > >       From my understanding, the port is supposed to be stored by
> > > xenstore in /local/domain/, but I can't see the port being created using
> > > xenstore-ls either.
> >
> > I think this event channel is created by the guest and written to
> > xenstore as part of support for the fast event channel based save
> > mechanism used by e.g. remus. In its absence save/suspend is triggered
> > via the traditional method of the toolstack writing commands to the
> > "control/shutdown" node.
> >
> > IIRC the fast event channel based save stuff is not in mainline kernels,
> > so the tools message is correct but harmless.
> >
> > CCing Shriram (Remus maintainer) in case I've got all the above wrong...
> >
> >
> Ian is right. Mainline kernels don't have suspend event channel.
> Unfortunately, not having suspend event channel results in a pretty big
> performance hit,
> as each suspend call takes about 7-10ms and a resume takes 2-4ms. You are
> looking
> at approx 10% loss of execution time just to suspend/resume the VM
> (assuming
> a 100ms checkpoint interval).

What is involved in implementing it?

 This is as far as my understanding goes
  1. establish a dedicated event channel in the guest (i think in drivers/xen/manage.c)
  2. publish it to xenstore
  3. Listen on the event channel for suspend/suspend-cancel/resume events from dom0
  4. start a dedicated kernel thread to service events on this event channel.
  5. When the kernel receives an event on this channel, queue it up to this thread.
The thread:
  The usual suspend routine - quiesce all cpus, save all power state using the PM infrastructure

However, looking at old pvops code, there seems to be a couple of optimizations that
sped up the whole suspend process. The process as such, for Remus, involves suspending
and resuming only blkfront and netfront.  But, the whole power management infrastructure's
freeze every driver, thaw every driver approach seems overkill, adding few milliseconds
to the suspend/resume procedure.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.