[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 3/4] xen: add new stable control hypercall


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 22 Nov 2021 16:39:50 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nA/WV77cXc/Qt4hmh824KuWrpgrzqOYrF87HGQUhCEo=; b=eZ8upUDpnmPYgBlEPlrmMvT9WO7+PArHT4/t1lmpTfOKHE1XqIB+0EdEDNTN7JKY9//TWaWJWKvcjtwUKm66eJieRC8VfQzoGAngr/mw598/NHksQ7NxuARewpPUIdEQPL8NfMnrA9oSiqG5Hv/aayvqA0CaJUjDJTw/xhaTOITS/KQgXmq8zpQb48DDgUOy4e4NuH7Wfoaj1fD9zgrgPPcexFOjdXQB1cf/1HU1nYSwQQYFOsCRcYgWL+gYIVpYns+bq7+2kI2KAffLLMBSQ5aTSH/PKrqXqVpwu/en40QLOsWEDYJU4qoFdGOIzwK5jTTUwSsGx0gxdPB+Zkou5g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d7VFEk47uE3LJi+k13Ly1f1Dy18q2HbENCrsiP0SateMlP/Jl32sV/+ifyo7T4ZIYMmejuy5iPtkiz9oZIIrM5LNSWygAbIr0QYW3LM9s+612sF4Q0+DtSCUm/HPZPK0dxqhs0wyh0vjBC4EwX9AMN68QwZ/u+V60Be0gQc8scOSP685dBGud689gJXaWvL+tYlnqNyvKiTaw7RowiUMXqCI+EU348nU7IEneNh+RciZolE7X/JI7qe5lYV75CPyhLw6SKDbtElfNKVQxOLQcxLJEiksDRDTVZlBJ0wl4jREmb1ss3QfHUZRtK3USzDz6uBwIRzwEY+iUb9IluPycw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 22 Nov 2021 15:40:13 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.09.2021 14:35, Juergen Gross wrote:
> The sysctl and domctl hypercalls are not stable, so tools using those
> need to be in sync with the hypervisor.
> 
> In order to decouple (some) tools from the hypervisor add a new stable
> hypercall __HYPERVISOR_control_op

I'm not convinced we need a new hypercall. New sub-ops of the existing ones
can be declared stable (and be made bypass the interface version checks).
If we want/need a new one, "control" is too generic: There's a reason we
currently have separate domctl and sysctl, and I think if we want new
hypercalls rather than new sub-ops, then we'd again want a global and a
per-domain one (unless the new one had provisions to be able to serve
both purposes).

> with (for now) two sub-options:
> 
> - XEN_CONTROL_OP_get_version for retrieving the max version of the new
>   hypercall supported by the hypervisor
> - XEN_CONTROL_OP_get_state_changed_domain for retrieving some state
>   data of a domain which changed state (this is needed by Xenstore).
>   The returned state just contains the domid, the domain unique id,
>   and some flags (existing, shutdown, dying).

If we go with a new hypercall, I think you want to split its introduction
(with just the version sub-op) from the addition of get_state_changed_dom.

> --- /dev/null
> +++ b/xen/common/control.c
> @@ -0,0 +1,52 @@
> +/******************************************************************************
> + *
> + * control.c
> + *
> + * Entry point of the stable __HYPERVISOR_control_op hypercall.
> + */
> +#include <xen/err.h>
> +#include <xen/event.h>
> +#include <xen/guest_access.h>
> +#include <xen/hypercall.h>
> +#include <public/control.h>
> +
> +long do_control_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    int ret = 0;
> +
> +    if ( xsm_control_op(XSM_OTHER, cmd) )
> +        return -EPERM;
> +
> +    switch ( cmd )
> +    {
> +    case XEN_CONTROL_OP_get_version:
> +        if ( !guest_handle_is_null(arg) )
> +            return -EINVAL;
> +
> +        ret = XEN_CONTROL_VERSION;
> +        break;
> +
> +    case XEN_CONTROL_OP_get_state_changed_domain:
> +    {
> +        struct xen_control_changed_domain info = { };
> +
> +        if ( get_global_virq_handler(VIRQ_DOM_EXC) != current->domain )
> +            return -EPERM;

The function result is stale by the time it gets made use of here. If this
is deemed not to be a problem, then I guess it wants saying so in the
description.

> @@ -103,6 +104,43 @@ void domain_reset_states(void)
>      rcu_read_unlock(&domlist_read_lock);
>  }
>  
> +int domain_get_dom_state_changed(struct xen_control_changed_domain *info)
> +{
> +    unsigned int dom;
> +    struct domain *d;
> +
> +    while ( (dom = find_first_bit(dom_state_changed, DOMID_MASK + 1)) <
> +            DOMID_FIRST_RESERVED )

As per my comment on the earlier patch - the use of DOMID_MASK + 1 vs
is quite puzzling here.

> +    {
> +        d = rcu_lock_domain_by_id(dom);
> +
> +        if ( test_and_clear_bit(dom, dom_state_changed) )
> +        {
> +            info->domid = dom;
> +            if ( d )
> +            {
> +                info->state = XEN_CONTROL_CHANGEDDOM_STATE_EXIST;
> +                if ( d->is_shut_down )
> +                    info->state |= XEN_CONTROL_CHANGEDDOM_STATE_SHUTDOWN;
> +                if ( d->is_dying == DOMDYING_dead )
> +                    info->state |= XEN_CONTROL_CHANGEDDOM_STATE_DYING;
> +                info->unique_id = d->unique_id;
> +
> +                rcu_unlock_domain(d);
> +            }
> +
> +            return 0;

With rapid creation of short lived domains, will the caller ever get to
see information on higher numbered domains (if, say, it gets "suitably"
preempted within its own environment)? IOW shouldn't there be a way for
the caller to specify a domid to start from?

> +        }
> +
> +        if ( d )
> +        {
> +            rcu_unlock_domain(d);
> +        }

Nit: Unnecessary braces.

> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -894,11 +894,16 @@ static struct domain *global_virq_handlers[NR_VIRQS] 
> __read_mostly;
>  
>  static DEFINE_SPINLOCK(global_virq_handlers_lock);
>  
> -void send_global_virq(uint32_t virq)
> +struct domain *get_global_virq_handler(uint32_t virq)
>  {
>      ASSERT(virq_is_global(virq));
>  
> -    send_guest_global_virq(global_virq_handlers[virq] ?: hardware_domain, 
> virq);
> +    return global_virq_handlers[virq] ?: hardware_domain;
> +}
> +
> +void send_global_virq(uint32_t virq)
> +{
> +    send_guest_global_virq(get_global_virq_handler(virq), virq);
>  }

Following my comment further up, I think external exposure of this requires
to finally eliminate the (pre-existing) risk of race here. I think
get_knownalive_domain() is all it takes to at least prevent the domain
disappearing behind our backs, with the extra reference transferred to the
caller. Yet we may want to additionally be assured that the domain in
question continues to be the one handling the respective vIRQ ...

> --- /dev/null
> +++ b/xen/include/public/control.h
> @@ -0,0 +1,80 @@
> +/******************************************************************************
> + * Xen Control Hypercall
> + *
> + * Copyright (c) 2021, SUSE Software Solutions Germany GmbH
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __XEN_PUBLIC_CONTROL_H__
> +#define __XEN_PUBLIC_CONTROL_H__
> +
> +#include "xen.h"
> +
> +/*
> + * Definitions for the __HYPERVISOR_control_op hypercall.
> + */
> +
> +/* Highest version number of the control interface currently defined. */
> +#define XEN_CONTROL_VERSION      1
> +
> +/*
> + * Hypercall operations.
> + */
> +
> +/*
> + * XEN_CONTROL_OP_get_version
> + *
> + * Read highest interface version supported by the hypervisor.
> + *
> + * arg: NULL
> + *
> + * Possible return values:
> + * >0: highest supported interface version
> + * <0: negative Xen errno value
> + */
> +#define XEN_CONTROL_OP_get_version                  0

What would a caller use the returned value for? I guess this follows
XEN_HYPFS_OP_get_version, but I'm less certain of the utility here.
Incompatible extensions are easy to make use separate sub-ops, unlike
possible extensions there to struct xen_hypfs_dir{,list}entry.

> +/*
> + * XEN_CONTROL_OP_get_state_changed_domain
> + *
> + * Get information about a domain having changed state and reset the state
> + * change indicator for that domain. This function is usable only by a domain
> + * having registered the VIRQ_DOM_EXC event (normally Xenstore).
> + *
> + * arg: XEN_GUEST_HANDLE(struct xen_control_changed_domain)
> + *
> + * Possible return values:
> + * 0: success
> + * <0 : negative Xen errno value
> + */
> +#define XEN_CONTROL_OP_get_state_changed_domain     1
> +struct xen_control_changed_domain {
> +    domid_t domid;
> +    uint16_t state;
> +#define XEN_CONTROL_CHANGEDDOM_STATE_EXIST     0x0001  /* Domain is 
> existing. */
> +#define XEN_CONTROL_CHANGEDDOM_STATE_SHUTDOWN  0x0002  /* Shutdown finished. 
> */
> +#define XEN_CONTROL_CHANGEDDOM_STATE_DYING     0x0004  /* Domain dying. */
> +    uint32_t pad1;           /* Returned as 0. */
> +    uint64_t unique_id;      /* Unique domain identifier. */
> +    uint64_t pad2[6];        /* Returned as 0. */

I think the padding fields have to be zero on input, not just on return.
Unless you mean to mandate them to be OUT only now and forever. I also
wonder how the trailing padding plays up with the version sub-op: Do we
really need such double precaution?

Also - should we use uint64_aligned_t here?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.