[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal



On 03/09/14 17:25, David Scott wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.
>
> This patch updates the xenstore ring protocol definition, enabling
> a server to advertise additional features to the guest. One such feature
> is defined: the ability to cleanly reset the ring (but not the
> higher-level protocol, for which we already have RESET_WATCHES).
>
> This patch implements the ring reconnection features in oxenstored
> and hvmloader, fixing the bug.
>
> This patch also defines an 'invalid' xenstore packet type and uses this
> to poison the ring over a reconnect. This will make diagnosing this
> bug much easier in future.
>
> Signed-off-by: David Scott <dave.scott@xxxxxxxxxx>
>
> ---
>
> Updates since v3
> * hvmloader: signal the event channel after requesting a reconnect
>   (thanks to Zheng Li for diagnosing this)
> * switch to using feature flags/bits instead of version numbers
> * rename the 'closing' field to 'connection state'
> * define an invalid packet type (0xffff, since 0x0 was already taken)
> * on ring connection/reset, fill the input and output buffers with
>   the invalid packet
> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
>   remove #define ERROR_FOO
> * doc: write from the guest's point of view, weaken guarantees made by
>   server (e.g. producer = consumer != 0 is valid after reconnect)
> * doc: relate reconnection to the RESET_WATCHES command in the higher
>   level protocol
> * doc: be more specific about who must write what, when
>
> Updates since v2
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
>
> Updates since v1
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> ---
>  docs/misc/xenstore-ring.txt         |  112 
> +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/xenbus.c   |   48 +++++++++------
>  tools/ocaml/libs/xb/xb.ml           |   27 ++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    1 +
>  tools/ocaml/libs/xb/xs_ring.ml      |   10 ++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |  109 ++++++++++++++++++++++++----------
>  xen/include/public/io/xs_wire.h     |   13 +++-
>  7 files changed, 269 insertions(+), 51 deletions(-)
>  create mode 100644 docs/misc/xenstore-ring.txt
>
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..6d18556
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,112 @@
> +The xenstore ring is a datastructure stored within a single 4KiB page
> +shared between the xenstore server and the guest. The ring contains
> +two queues of bytes -- one in each direction -- and some signalling
> +information. The [xenstore protocol](xenstore.txt) is layered on top of
> +the byte streams.
> +
> +The xenstore ring datastructure
> +===============================
> +
> +The following table describes the ring structure where
> +  - offsets and lengths are in bytes;
> +  - "Input" is used to describe the data sent to the server; and
> +  - "Output" is used to describe the data sent to the domain.
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Input data
> +1024    1024    Output data
> +2048    4       Input consumer offset
> +2052    4       Input producer offset
> +2056    4       Output consumer offset
> +2060    4       Output producer offset
> +2064    4       Server feature bitmap
> +2068    4       Connection state
> +
> +The Input data and Output data are circular buffers. Each buffer is
> +associated with a pair of free-running offsets labelled "consumer" and
> +"producer".
> +
> +A "producer" offset is the offset in the byte stream of the next byte
> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
> +stream of the next byte to be read modulo 2^32. Implementations must
> +take care to handle wraparound properly when performing arithmetic with
> +these values.
> +
> +The byte at offset 'x' in the byte stream will be stored at offset
> +'x modulo 1024' in the circular buffer.
> +
> +Implementations may only overwrite previously-written data if it has
> +been marked as 'consumed' by the relevant consumer pointer.
> +
> +When the guest domain is created, there is no outstanding Input or Output
> +data. However
> +
> +  - guests must not assume that producer or consumer pointers start
> +    at zero; and
> +  - guests must not assume that unused bytes in either the Input or
> +    Output data buffers has any particular value.
> +
> +A xenstore ring is always associated with an event channel. Whenever the
> +ring structure is updated the event channel must be signalled. The
> +guest and server are free to inspect the contents of the ring at any
> +time, not only in response to an event channel event. This implies that
> +updates must be ordered carefully to ensure consistency.
> +
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.
> +
> +The following features are defined:
> +
> +Mask    Description
> +-----------------------------------------------------------------
> +1       Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:
> +
> +Value   Description
> +-----------------------------------------------------------------
> +0       Ring is connected
> +1       Ring close and reconnect is in progress (see the "ring
> +        reconnection feature" described below)
> +
> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> +  - drop any partially read or written higher-level
> +    [xenstore protocol](xenstore.txt) packets it may have;
> +  - empty the Input and Output queues in the xenstore ring;
> +  - set the Connection state to 0 ("Ring is connected"); and
> +  - signal the event channel.
> +
> +From the point of view of the guest, the connection has been reset on a
> +packet boundary.
> +
> +Note that only the guest may set the Connection state to 1 and only the
> +server may set it back to 0.
> diff --git a/tools/firmware/hvmloader/xenbus.c 
> b/tools/firmware/hvmloader/xenbus.c
> index 64c2176..f900a1e 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared 
> ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>  
> +static void ring_wait(void)
> +{
> +    struct shared_info *shinfo = get_shared_info();
> +    struct sched_poll poll;
> +
> +    memset(&poll, 0, sizeof(poll));
> +    set_xen_guest_handle(poll.ports, &event);
> +    poll.nr_ports = 1;
> +
> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> +        hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -61,14 +74,26 @@ void xenbus_setup(void)
>  void xenbus_shutdown(void)
>  {
>      struct shared_info *shinfo = get_shared_info();
> +    evtchn_send_t send;
>  
>      ASSERT(rings != NULL);
>  
> -    /* We zero out the whole ring -- the backend can handle this, and it's 
> -     * not going to surprise any frontends since it's equivalent to never 
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> -
> +    if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
> +        rings->connection = XENSTORE_RECONNECT;
> +        send.port = event;
> +        hypercall_event_channel_op(EVTCHNOP_send, &send);
> +        while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
> +            ring_wait ();
> +    } else {
> +        /* If the backend reads the state while we're erasing it then the
> +         * ring state will become corrupted, preventing guest frontends from
> +         * connecting. This is rare. To help diagnose the failure, we fill
> +         * the ring with XS_INVALID packets. */
> +        memset(rings->req, 0xff, XENSTORE_RING_SIZE);
> +        memset(rings->rsp, 0xff, XENSTORE_RING_SIZE);
> +        rings->req_cons = rings->req_prod = 0;
> +        rings->rsp_cons = rings->rsp_prod = 0;
> +    }
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>      memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending));
> @@ -77,19 +102,6 @@ void xenbus_shutdown(void)
>      rings = NULL;
>  }
>  
> -static void ring_wait(void)
> -{
> -    struct shared_info *shinfo = get_shared_info();
> -    struct sched_poll poll;
> -
> -    memset(&poll, 0, sizeof(poll));
> -    set_xen_guest_handle(poll.ports, &event);
> -    poll.nr_ports = 1;
> -
> -    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> -        hypercall_sched_op(SCHEDOP_poll, &poll);
> -}
> -
>  /* Helper functions: copy data in and out of the ring */
>  static void ring_write(const char *data, uint32_t len)
>  {
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..6332433 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
>       | Fd backfd     -> write_fd backfd con s len
>       | Xenmmap backmmap -> write_mmap backmmap con s len
>  
> -let output con =
> +(* If a function throws Xs_ring.Reconnect, then clear the ring state
> +   and serve the ring again. *)
> +let rec handle_reconnect f con =
> +     match (try Some (f con) with Xs_ring.Reconnect -> None) with
> +     | Some x -> x
> +     | None ->
> +             begin match con.backend with
> +             | Fd _ -> raise Xs_ring.Reconnect (* should never happen, but 
> just in case *)
> +             | Xenmmap backend ->
> +                     Xs_ring.close backend.mmap;
> +                     backend.eventchn_notify ();
> +                     (* Clear our old connection state *)
> +                     Queue.clear con.pkt_in;
> +                     Queue.clear con.pkt_out;
> +                     con.partial_in <- init_partial_in ();
> +                     con.partial_out <- "";
> +                     handle_reconnect f con
> +             end
> +
> +let output = handle_reconnect (fun con ->
>       (* get the output string from a string_of(packet) or partial_out *)
>       let s = if String.length con.partial_out > 0 then
>                       con.partial_out
> @@ -101,8 +120,9 @@ let output con =
>       );
>       (* after sending one packet, partial is empty *)
>       con.partial_out = ""
> +)
>  
> -let input con =
> +let input = handle_reconnect (fun con ->
>       let newpacket = ref false in
>       let to_read =
>               match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
>                       HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>       );
>       !newpacket
> +)
>  
>  let newcon backend = {
>       backend = backend;
> @@ -145,6 +166,8 @@ let newcon backend = {
>  let open_fd fd = newcon (Fd { fd = fd; })
>  
>  let open_mmap mmap notifyfct =
> +     (* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *)
> +     Xs_ring.set_server_features mmap 1;

Minor nit here: the hard-coded 1 looks bad.

>       newcon (Xenmmap {
>               mmap = mmap;
>               eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..d47d869 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>  val write_fd : backend_fd -> 'a -> string -> int -> int
>  val write_mmap : backend_mmap -> 'a -> string -> int -> int
>  val write : t -> string -> int -> int
> +val handle_reconnect : (t -> 'a) -> t -> 'a
>  val output : t -> bool
>  val input : t -> bool
>  val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..2c4eeab 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,15 @@
>   * GNU Lesser General Public License for more details.
>   *)
>  
> +exception Reconnect
> +
> +let _ =
> +  Callback.register_exception "Xs_ring.Reconnect" Reconnect
> +
>  external read: Xenmmap.mmap_interface -> string -> int -> int = 
> "ml_interface_read"
>  external write: Xenmmap.mmap_interface -> string -> int -> int = 
> "ml_interface_write"
> +
> +external set_server_features: Xenmmap.mmap_interface -> int -> unit = 
> "ml_interface_set_server_features" "noalloc"
> +external get_server_features: Xenmmap.mmap_interface -> int = 
> "ml_interface_get_server_features" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" 
> "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c 
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..c728a01 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,22 +35,39 @@
>  
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>  
> -static int xs_ring_read(struct mmap_interface *interface,
> -                             char *buffer, int len)
> +CAMLprim value ml_interface_read(value ml_interface,
> +                                 value ml_buffer,
> +                                 value ml_len)
>  {
> +     CAMLparam3(ml_interface, ml_buffer, ml_len);
> +     CAMLlocal1(ml_result);
> +
> +     struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
> +     char *buffer = String_val(ml_buffer);
> +     int len = Int_val(ml_len);
> +     int result;
> +
>       struct xenstore_domain_interface *intf = interface->addr;
>       XENSTORE_RING_IDX cons, prod; /* offsets only */
>       int to_read;
> +     uint32_t connection;
>  
>       cons = *(volatile uint32*)&intf->req_cons;
>       prod = *(volatile uint32*)&intf->req_prod;
> +     connection = *(volatile uint32*)&intf->connection;
> +
> +     if (connection != XENSTORE_CONNECTED)
> +             caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
> +
>       xen_mb();
>  
>       if ((prod - cons) > XENSTORE_RING_SIZE)
> -         return -1;
> +             caml_failwith("bad connection");
>  
> -     if (prod == cons)
> -             return 0;
> +     if (prod == cons) {
> +             result = 0;
> +             goto exit;
> +     }
>       cons = MASK_XENSTORE_IDX(cons);
>       prod = MASK_XENSTORE_IDX(prod);
>       if (prod > cons)
> @@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface,
>       memcpy(buffer, intf->req + cons, len);
>       xen_mb();
>       intf->req_cons += len;
> -     return len;
> +     result = len;
> +exit:
> +     ml_result = Val_int(result);
> +     CAMLreturn(ml_result);
>  }
>  
> -static int xs_ring_write(struct mmap_interface *interface,
> -                              char *buffer, int len)
> +CAMLprim value ml_interface_write(value ml_interface,
> +                                  value ml_buffer,
> +                                  value ml_len)
>  {
> +     CAMLparam3(ml_interface, ml_buffer, ml_len);
> +     CAMLlocal1(ml_result);
> +
> +     struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
> +     char *buffer = String_val(ml_buffer);
> +     int len = Int_val(ml_len);
> +     int result;
> +
>       struct xenstore_domain_interface *intf = interface->addr;
>       XENSTORE_RING_IDX cons, prod;
>       int can_write;
> +     uint32_t connection;
>  
>       cons = *(volatile uint32*)&intf->rsp_cons;
>       prod = *(volatile uint32*)&intf->rsp_prod;
> +     connection = *(volatile uint32*)&intf->connection;
> +
> +     if (connection != XENSTORE_CONNECTED)
> +             caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
> +
>       xen_mb();
> -     if ( (prod - cons) >= XENSTORE_RING_SIZE )
> -             return 0;
> +     if ( (prod - cons) >= XENSTORE_RING_SIZE ) {
> +             result = 0;
> +             goto exit;
> +     }
>       if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
>               can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
>       else 
> @@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface 
> *interface,
>       memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
>       xen_mb();
>       intf->rsp_prod += len;
> -     return len;
> +     result = len;
> +exit:
> +     ml_result = Val_int(result);
> +     CAMLreturn(ml_result);
>  }
>  
> -CAMLprim value ml_interface_read(value interface, value buffer, value len)
> +CAMLprim value ml_interface_set_server_features(value interface, value v)
>  {
> -     CAMLparam3(interface, buffer, len);
> -     CAMLlocal1(result);
> -     int res;
> +     CAMLparam2(interface, v);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
>  
> -     res = xs_ring_read(GET_C_STRUCT(interface),
> -                        String_val(buffer), Int_val(len));
> -     if (res == -1)
> -             caml_failwith("bad connection");
> -     result = Val_int(res);
> -     CAMLreturn(result);
> +     intf->server_features = Int_val(v);
> +
> +     CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_features(value interface)
> +{
> +     CAMLparam1(interface);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> +
> +     CAMLreturn(Val_int (intf->server_features));
>  }
>  
> -CAMLprim value ml_interface_write(value interface, value buffer, value len)
> +CAMLprim value ml_interface_close(value interface)
>  {
> -     CAMLparam3(interface, buffer, len);
> -     CAMLlocal1(result);
> -     int res;
> -
> -     res = xs_ring_write(GET_C_STRUCT(interface),
> -                         String_val(buffer), Int_val(len));
> -     result = Val_int(res);
> -     CAMLreturn(result);
> +     CAMLparam1(interface);
> +     struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> +     int i;
> +
> +     intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> +     /* Ensure the unused space is full of invalid xenstore packets. */
> +     for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> +             intf->req[i] = 0xff; /* XS_INVALID = 0xffff */
> +             intf->rsp[i] = 0xff;
> +     }
> +     xen_mb ();
> +     intf->connection = XENSTORE_CONNECTED;
> +     CAMLreturn(Val_unit);
>  }
> diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
> index 585f0c8..0a0cdbc 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -49,7 +49,9 @@ enum xsd_sockmsg_type
>      XS_RESUME,
>      XS_SET_TARGET,
>      XS_RESTRICT,
> -    XS_RESET_WATCHES
> +    XS_RESET_WATCHES,
> +
> +    XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
>  };
>  
>  #define XS_WRITE_NONE "NONE"
> @@ -116,6 +118,8 @@ struct xenstore_domain_interface {
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t server_features; /* Bitmap of features supported by the server 
> */
> +    uint32_t connection;
>  };
>  
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */
> @@ -125,6 +129,13 @@ struct xenstore_domain_interface {
>  #define XENSTORE_ABS_PATH_MAX 3072
>  #define XENSTORE_REL_PATH_MAX 2048
>  
> +/* The ability to reconnect a ring */
> +#define XENSTORE_SERVER_FEATURE_RECONNECTION 1
> +
> +/* Valid values for the connection field */
> +#define XENSTORE_CONNECTED 0 /* the steady-state */
> +#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
> +
>  #endif /* _XS_WIRE_H */
>  
>  /*

Other than that, it looks good to me.

Reviewed-by: Jon Ludlam <jonathan.ludlam@xxxxxxxxxx>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.