[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xend: resume a guest domain after an unsuccessful live migration



On Mon, 2013-02-04 at 07:24 +0000, Elena V. Titova wrote:
> Hello.
> 
> We use debian sarge, linux-image-3.2.0-3-amd64 and xen-4.1.3 on our
> servers.

Do you really mean Sarge? Or did you mean Squeeze or Wheezy? Those
kernel and Xen versions look like Wheezy versions but perhaps you are
using backports.

> When a live migration is run the guest domain may not resume on a
> destination
> host and is destroyed on a source host.
> This patch fixes it by resuming the guest domain on a source host when a
> start of
> the guest domain was failed.

xend is supposed to be in maintenance mode so I'm not too sure about
this sort of change.

In particular I'm worried that it might break migration from Xen version
N to version N+1 which is something we try and support.

BTW the xl toolstack already has this functionality so another option
for you may be to switch to that.

> git diff tools/python/xen/xend/XendCheckpoint.py
> diff --git a/tools/python/xen/xend/XendCheckpoint.py
> b/tools/python/xen/xend/XendCheckpoint.py
> index fa09757..6b8765f 100644
> --- a/tools/python/xen/xend/XendCheckpoint.py
> +++ b/tools/python/xen/xend/XendCheckpoint.py
> @@ -163,12 +163,16 @@ def save(fd, dominfo, network, live, dst,
> checkpoint=False, node=-1,sock=None):
>              dominfo.resumeDomain()
>          else:
>              if live and sock != None:

This same class of errors isn't possible for non-live?

> +                status = os.read(fd, 64)

The written strings are 7 or 4 bytes, it would be better to choose a
fixed length for all writes and the read I think. That might mean
padding the fail message. Also these protocol strings should be defined
as constants rather than open coded.

Even with that addressed I don't really feel confident enough about xend
internals to Ack a patch like this.

>                  try:
>                      sock.shutdown(2)
>                  except:
>                      pass
>                  sock.close()
> 
> +                if status == "FAIL":
> +                    raise XendError("Restore failed")
> +
>              dominfo.destroy()
>              dominfo.testDeviceComplete()
>          try:
> @@ -351,8 +355,14 @@ def restore(xd, fd, dominfo = None, paused = False,
> relocating = False):
>          if not paused:
>              dominfo.unpause()
> 
> +        if relocating:
> +            os.write(fd, "SUCCESS")
> +
>          return dominfo
>      except Exception, exn:
> +        if relocating:
> +            os.write(fd, "FAIL")
> +
>          dominfo.destroy()
>          log.exception(exn)
>          raise exn 
> 
> --
> Elena Titova
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.