[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v2 1/2] xl: Keep monitoring suspended domain


  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jason Andryuk <jason.andryuk@xxxxxxx>
  • Date: Tue, 26 Nov 2024 12:19:40 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uu7AKaBE7cQGMVaJpznh0ldBUuYX3CfpL80utGtzydc=; b=aCSim3baJZEMxj3tiQqZUrWXJgBNyUsqBRVSAeWC5kBfX96ardZkI4EfTozwA7XDPj9b0NVtQJRIzioGToFPIslZCP789ou00ei9pUZewKBg20fpZvjAYQCmT3EZo1u75mRyCcClDZ1Z3uhcrGdhd2EXDD3j003o7sMZ3uUzCg5rUG67vgheaBwpdizZCFaIdL4BUwzj8HmDxGlgbKXWVRiS/hT5/IFOw8uL83afT0ILTPJCMcip8WaJ1AqctHaOo3Jgx1urtMUYgahOGoFQj1CqnOIFAw6BLWb0WBDXajye4WYZNjuvCWY+HdMMK3hCGit4IejhtXd4TQzWm6T8Dg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yunzMZ21GMUd/h77Vtl5ARGp3smqCcADElWnwjFAsOL5l564FOgeLRyYBAQPeo0mpnqzClvmuXssQej9JXavHmxx+Z8ykwm4Fw4232E+0wbjaVwwGfGt8klngqoiQ2MXuH8Rcoidvg93ebVECMp4rc8/Uw5D8sevObOWB1ki+zgjLrbdPpvfOnBdM9Rro7tpte4ARXkrSPzr1AFDN0TXrpmQYAvcvbtBpQJhTOU67vnGgwROjIHlBir1kpAzZx04Lm8rgPQkvBUdVTdqOentKJGBHxupET37lAZikvESgIUVxATB5V+z71dNi4DJR22Fbz0BFdo51T965lECQuyMMg==
  • Cc: Jason Andryuk <jason.andryuk@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>
  • Delivery-date: Tue, 26 Nov 2024 17:20:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

When a VM transitioned to LIBXL_SHUTDOWN_REASON_SUSPEND, the xl daemon
was exiting as 0 = DOMAIN_RESTART_NONE "No domain restart".
Later, when the VM actually shutdown, the missing xl daemon meant the
domain wasn't cleaned up properly.

Add a new DOMAIN_RESTART_SUSPENDED to handle the case.  The xl daemon
keeps running to react to future shutdown events.

The domain death event needs to be re-enabled to catch subsequent
events.  The libxl_evgen_domain_death is moved from death_list to
death_reported, and then it isn't found on subsequent iterations through
death_list.  We enable the new event before disabling the old event, to
keep the xenstore watch active.  If it is unregistered and
re-registered, it'll fire immediately for our suspended domain which
will end up continuously re-triggering.

Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
---
 tools/xl/xl.h           |  1 +
 tools/xl/xl_vmcontrol.c | 18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 9c86bb1d98..967d034cfe 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -301,6 +301,7 @@ typedef enum {
     DOMAIN_RESTART_NORMAL,       /* Domain should be restarted */
     DOMAIN_RESTART_RENAME,       /* Domain should be renamed and restarted */
     DOMAIN_RESTART_SOFT_RESET,   /* Soft reset should be performed */
+    DOMAIN_RESTART_SUSPENDED,    /* Domain suspended - keep looping */
 } domain_restart_type;
 
 extern void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE 
*fh);
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index fa1a4420e3..c45d497c28 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -417,7 +417,7 @@ static domain_restart_type handle_domain_death(uint32_t 
*r_domid,
         break;
     case LIBXL_SHUTDOWN_REASON_SUSPEND:
         LOG("Domain has suspended.");
-        return 0;
+        return DOMAIN_RESTART_SUSPENDED;
     case LIBXL_SHUTDOWN_REASON_CRASH:
         action = d_config->on_crash;
         break;
@@ -1030,6 +1030,7 @@ start:
         }
     }
     while (1) {
+        libxl_evgen_domain_death *deathw2 = NULL;
         libxl_event *event;
         ret = domain_wait_event(domid, &event);
         if (ret) goto out;
@@ -1100,9 +1101,24 @@ start:
                 ret = 0;
                 goto out;
 
+            case DOMAIN_RESTART_SUSPENDED:
+                LOG("Continue waiting for domain %u", domid);
+                /*
+                 * Enable a new event before disabling the old.  This ensures
+                 * the xenstore watch remains active.  Otherwise it'll fire
+                 * immediately on re-registration and find our suspended 
domain.
+                 */
+                ret = libxl_evenable_domain_death(ctx, domid, 0, &deathw2);
+                if (ret) goto out;
+                libxl_evdisable_domain_death(ctx, deathw);
+                deathw = deathw2;
+                deathw2 = NULL;
+                break;
+
             default:
                 abort();
             }
+            break;
 
         case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
             LOG("Domain %u has been destroyed.", domid);
-- 
2.34.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.