[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Patch v4 7/9] correct xc_domain_save()'s return value



On 09/23/2014 05:25 PM, Andrew Cooper wrote:
> On 23/09/14 10:10, Wen Congyang wrote:
>> On 09/23/2014 05:00 PM, Andrew Cooper wrote:
>>> On 23/09/14 03:14, Wen Congyang wrote:
>>>> On 09/22/2014 10:06 PM, Andrew Cooper wrote:
>>>>> On 22/09/14 14:13, Ian Campbell wrote:
>>>>>> On Mon, 2014-09-22 at 14:03 +0100, Ian Jackson wrote:
>>>>>>> Ian Campbell writes ("Re: [Xen-devel] [RFC Patch v4 7/9] correct 
>>>>>>> xc_domain_save()'s return value"):
>>>>>>>> libxc doesn't know that, if it is important then it seems like the
>>>>>>>> failure + errno ought to be marshalled across the IPC link.
>>>>>>> Yes, but ...
>>>>>>>
>>>>>>>> It may be that this can be easily handled in
>>>>>>>> libxl__srm_callout_sendreply + helper_getreply. Ian J -- what do you
>>>>>>>> think?
>>>>>>> ... while that would be possbile, we have another option.
>>>>>>>
>>>>>>> We could say that the callbacks return errno values.  That would
>>>>>>> simplify the API and avoid having the IPC involve accesses to global
>>>>>>> variables (ie, things not in the functions' parameter lists).
>>>>>>>
>>>>>>> If we do that then it becomes the responsibility of xc_domain_save to
>>>>>>> either change its own API to return errno, or to save the callback's
>>>>>>> return value in errno.
>>>>>> Hrm. libxc is already a complete mess wrt error returning/handling
>>>>>> because some proportion of the code incorrectly does/assumes this sort
>>>>>> of thing is happening (because people were confused about the syscall
>>>>>> returns from the kernel vs. process context). Having a place in libxc
>>>>>> where this is now done on purpose seems a bit like setting the rope on
>>>>>> fire to me...
>>>>>>
>>>>>> Ian.
>>>>>>
>>>>> libxc is an absolute mess, but this is far from the only codepath (even
>>>>> in xc_domain_save()) which ends up like this.
>>>>>
>>>>> The *only* safe assumption is that ==0 is success and !=0 is failure for
>>>>> xc_domain_save(), and errno may or may not be relevant, whether rc is -1
>>>>> or not.
>>>> Do you mean: errno is undefined even if rc is -1?
>>>>
>>>> Thanks
>>>> Wen Congyang
>>> Correct, last time I checked.  The error handling in libxc is in dire
>>> need of fixing from scratch.
>> For v2 version migration, we can do it like this. But xc_domain_save()
>> is a public API, and is used some years. So I am not sure if I can
>> change its behavior...
>>
>> Thanks
>> Wen Congyang
> 
> xc_domain_safe() is free to be changed.  libxc is an unstable API, and
> has been changed for every Xen release I recall.

What about this patch?

From 61f9470134c988e1ff3ae50eaa2e252d20379588 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@xxxxxxxxxxxxxx>
Date: Tue, 23 Sep 2014 17:45:54 +0800
Subject: [PATCH] correct xc_domain_save()'s return value

If suspend_and_state() fails, the errno may be 0. But we assume
that the errno is not 0. So remove assert().

If the callback checkpoint() fails, it means that remus
failed. But xc_domain_save() returns 0.

This patch changes xc_domain_save()'s behavior, and the errno is
undefined even if xc_domain_save() returns 1.

Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
---
 tools/libxc/xc_domain_save.c | 59 +++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..d96fd24 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -807,7 +807,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom, uint32_t max_iter
     xc_dominfo_t info;
     DECLARE_DOMCTL;
 
-    int rc, frc, i, j, last_iter = 0, iter = 0;
+    int rc = 1, frc, i, j, last_iter = 0, iter = 0;
     int live  = (flags & XCFLAGS_LIVE);
     int debug = (flags & XCFLAGS_DEBUG);
     int superpages = !!hvm;
@@ -2073,8 +2073,12 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
uint32_t dom, uint32_t max_iter
     goto out_rc;
 
  out:
-    rc = errno;
-    assert(rc);
+    /*
+     * When we come here, some error happened. But the errno may be undefined.
+     * For example, the callback suspend() fails, and we cannot get correct
+     * errno, because it may be implemented in libxl.
+     */
+    rc = 1;
  out_rc:
     completed = 1;
 
@@ -2113,30 +2117,34 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
uint32_t dom, uint32_t max_iter
     compressing = (flags & XCFLAGS_CHECKPOINT_COMPRESS);
 
     /* checkpoint_cb can spend arbitrarily long in between rounds */
-    if (!rc && callbacks->checkpoint &&
-        callbacks->checkpoint(callbacks->data) > 0)
+    if ( !rc && callbacks->checkpoint )
     {
-        /* reset stats timer */
-        print_stats(xch, dom, 0, &time_stats, &shadow_stats, 0);
-
-        /* last_iter = 1; */
-        if ( suspend_and_state(callbacks->suspend, callbacks->data, xch,
-                               io_fd, dom, &info) )
+        if ( callbacks->checkpoint(callbacks->data) > 0 )
         {
-            ERROR("Domain appears not to have suspended");
-            goto out;
-        }
-        DPRINTF("SUSPEND shinfo %08lx\n", info.shared_info_frame);
-        print_stats(xch, dom, 0, &time_stats, &shadow_stats, 1);
+            /* reset stats timer */
+            print_stats(xch, dom, 0, &time_stats, &shadow_stats, 0);
 
-        if ( xc_shadow_control(xch, dom,
-                               XEN_DOMCTL_SHADOW_OP_CLEAN, 
HYPERCALL_BUFFER(to_send),
-                               dinfo->p2m_size, NULL, 0, &shadow_stats) != 
dinfo->p2m_size )
-        {
-            PERROR("Error flushing shadow PT");
-        }
+            /* last_iter = 1; */
+            if ( suspend_and_state(callbacks->suspend, callbacks->data, xch,
+                                   io_fd, dom, &info) )
+            {
+                ERROR("Domain appears not to have suspended");
+                goto out;
+            }
+            DPRINTF("SUSPEND shinfo %08lx\n", info.shared_info_frame);
+            print_stats(xch, dom, 0, &time_stats, &shadow_stats, 1);
 
-        goto copypages;
+            if ( xc_shadow_control(xch, dom,
+                                   XEN_DOMCTL_SHADOW_OP_CLEAN, 
HYPERCALL_BUFFER(to_send),
+                                   dinfo->p2m_size, NULL, 0, &shadow_stats) != 
dinfo->p2m_size )
+            {
+                PERROR("Error flushing shadow PT");
+            }
+
+            goto copypages;
+        }
+        else
+            rc = 1;
     }
 
     if ( tmem_saved != 0 && live )
@@ -2174,11 +2182,10 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
uint32_t dom, uint32_t max_iter
     free(hvm_buf);
     outbuf_free(&ob_pagebuf);
 
-    errno = rc;
 exit:
-    DPRINTF("Save exit of domid %u with errno=%d\n", dom, errno);
+    DPRINTF("Save exit of domid %u with rc=%d\n", dom, rc);
 
-    return !!errno;
+    return rc;
 }
 
 /*
-- 

> 
> The migrationv2 code is just as guilty of using -1 and an undefined
> errno, although it does promise that there will be a relevant error in
> the log.  It stems from the same problems around the callbacks where
> there is insufficient information passed, but also from the likes of
> {read,write}_exact() which uses -1/0 for EOF.  I considered changing it
> to -1/EBADF and still not decided which is the lessor of two evils.  (As
> far as I am aware, the legacy code has the same problem.)
> 
> ~Andrew
> 
> .
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.