[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal



Hi Paul,

Thanks for the comments.

On 26 Jun 2014, at 10:05, Paul Durrant <Paul.Durrant@xxxxxxxxxx> wrote:

>> -----Original Message-----
>> From: David Scott [mailto:dave.scott@xxxxxxxxxx]
>> Sent: 25 June 2014 22:15
>> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
>> Cc: Paul Durrant; Ian Campbell; john.liuqiming@xxxxxxxxxx;
>> yanqiangjun@xxxxxxxxxx; Dave Scott
>> Subject: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
>> 
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>> 
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>> 
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>> 
>> This patch has only been lightly tested. I'd appreciate
>> feedback on the approach before going further.
>> 
>> Signed-off-by: David Scott <dave.scott@xxxxxxxxxx>
>> ---
>> tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>> tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>> tools/ocaml/libs/xb/xb.mli          |    3 +-
>> tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c |   81
>> ++++++++++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h     |    8 ++++
>> 6 files changed, 138 insertions(+), 9 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/xenbus.c
>> b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..15d961b 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /*
>> Shared ring with dom0 */
>> static evtchn_port_t event;                     /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
>> */
>> 
>> +static void ring_wait(void);
>> +
> 
> Personally, I'd move the definition of ring_wait() up rather than forward 
> declaring it.
> 
>> /* Connect our xenbus client to the backend.
>>  * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
>> 
>>     ASSERT(rings != NULL);
>> 
>> -    /* We zero out the whole ring -- the backend can handle this, and it's
>> -     * not going to surprise any frontends since it's equivalent to never
>> -     * having used the rings. */
>> -    memset(rings, 0, sizeof *rings);
>> +    if (rings->server_version > XENSTORE_VERSION_0) {
> 
> I notice in the changes to xs_wire.h below you have a client_version field. 
> Should that not be set here too?

I added fields for a client version and a server version initially, thinking it 
sounded like a good idea (symmetry and all that). Looking at it now, it might 
be a premature generalisation? We only need the server_version in this 
particular case since only the client needs to request a close/reopen.

I agree with you that, if we have a client_version field we should set it and 
test it. Or drop it :-)

> 
>> +        rings->closing = 1;
>> +        while (rings->closing == 1)
>> +            ring_wait ();
>> +    } else {
>> +        /* If the backend reads the state while we're erasing it then the
>> +           ring state will become corrupted, preventing guest frontends from
>> +           connecting. This is rare. */
>> +        memset(rings, 0, sizeof *rings);
>> +    }
>> 
>>     /* Clear the event-channel state too. */
>>     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
>> index 29d354d..d5cd776 100644
>> --- a/tools/ocaml/libs/xb/xb.ml
>> +++ b/tools/ocaml/libs/xb/xb.ml
>> @@ -84,7 +84,26 @@ let write con s len =
>>      | Fd backfd     -> write_fd backfd con s len
>>      | Xenmmap backmmap -> write_mmap backmmap con s len
>> 
>> -let output con =
>> +(* If a function throws Xs_ring.Closing, then clear the ring state
>> +   and serve the ring again. *)
>> +let rec handle_closing f con =
>> +    match (try Some (f con) with Xs_ring.Closing -> None) with
>> +    | Some x -> x
>> +    | None ->
>> +            begin match con.backend with
>> +            | Fd _ -> raise Xs_ring.Closing (* should never happen, but
>> just in case *)
>> +            | Xenmmap backend ->
>> +                    Xs_ring.close backend.mmap;
>> +                    backend.eventchn_notify ();
>> +                    (* Clear our old connection state *)
>> +                    Queue.clear con.pkt_in;
>> +                    Queue.clear con.pkt_out;
>> +                    con.partial_in <- init_partial_in ();
>> +                    con.partial_out <- "";
>> +                    handle_closing f con
>> +            end
>> +
>> +let output = handle_closing (fun con ->
>>      (* get the output string from a string_of(packet) or partial_out *)
>>      let s = if String.length con.partial_out > 0 then
>>                      con.partial_out
>> @@ -101,8 +120,9 @@ let output con =
>>      );
>>      (* after sending one packet, partial is empty *)
>>      con.partial_out = ""
>> +)
>> 
>> -let input con =
>> +let input = handle_closing (fun con ->
>>      let newpacket = ref false in
>>      let to_read =
>>              match con.partial_in with
>> @@ -133,6 +153,7 @@ let input con =
>>                      HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>>      );
>>      !newpacket
>> +)
>> 
>> let newcon backend = {
>>      backend = backend;
>> @@ -145,6 +166,7 @@ let newcon backend = {
>> let open_fd fd = newcon (Fd { fd = fd; })
>> 
>> let open_mmap mmap notifyfct =
>> +    Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>>      newcon (Xenmmap {
>>              mmap = mmap;
>>              eventchn_notify = notifyfct;
>> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
>> index 58234ae..af7ea5c 100644
>> --- a/tools/ocaml/libs/xb/xb.mli
>> +++ b/tools/ocaml/libs/xb/xb.mli
>> @@ -23,7 +23,7 @@ module Op :
>>       | Resume
>>       | Set_target
>>       | Restrict
>> -      | Invalid (* Not a valid wire operation *)
>> +      | Invalid
>>     val operation_c_mapping : operation array
>>     val size : int
>>     val array_search : 'a -> 'a array -> int
>> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>> val write_fd : backend_fd -> 'a -> string -> int -> int
>> val write_mmap : backend_mmap -> 'a -> string -> int -> int
>> val write : t -> string -> int -> int
>> +val handle_closing : (t -> 'a) -> t -> 'a
>> val output : t -> bool
>> val input : t -> bool
>> val newcon : backend -> t
>> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
>> index 9469609..dadf9e1 100644
>> --- a/tools/ocaml/libs/xb/xs_ring.ml
>> +++ b/tools/ocaml/libs/xb/xs_ring.ml
>> @@ -14,5 +14,18 @@
>>  * GNU Lesser General Public License for more details.
>>  *)
>> 
>> +exception Closing
>> +
>> +let _ =
>> +  Callback.register_exception "Xs_ring.Closing" Closing
>> +
>> external read: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_read"
>> external write: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_write"
>> +
>> +external set_client_version: Xenmmap.mmap_interface -> int -> unit =
>> "ml_interface_set_client_version" "noalloc"
>> +external get_client_version: Xenmmap.mmap_interface -> int =
>> "ml_interface_get_client_version" "noalloc"
>> +
>> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
>> "ml_interface_set_server_version" "noalloc"
>> +external get_server_version: Xenmmap.mmap_interface -> int =
>> "ml_interface_get_server_version" "noalloc"
>> +
>> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
>> "noalloc"
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..4ddf5a7 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>> 
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>> 
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>>                              char *buffer, int len)
>> {
>>      struct xenstore_domain_interface *intf = interface->addr;
>>      XENSTORE_RING_IDX cons, prod; /* offsets only */
>>      int to_read;
>> +        uint32_t closing;
>> 
>>      cons = *(volatile uint32*)&intf->req_cons;
>>      prod = *(volatile uint32*)&intf->req_prod;
>> +    closing = *(volatile uint32*)&intf->closing;
>> +
>> +    if (closing == 1)
> 
> Do you really mean if == 1 here? It's a flag, so != 0 would seem better. 
> Also, should you test the client version?

!= 0 would be fine by me.

> 
>> +            return ERROR_CLOSING;
>> +
>>      xen_mb();
>> 
>>      if ((prod - cons) > XENSTORE_RING_SIZE)
>> -        return -1;
>> +        return ERROR_UNKNOWN;
>> 
>>      if (prod == cons)
>>              return 0;
>> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
>> *interface,
>>      struct xenstore_domain_interface *intf = interface->addr;
>>      XENSTORE_RING_IDX cons, prod;
>>      int can_write;
>> +    uint32_t closing;
>> 
>>      cons = *(volatile uint32*)&intf->rsp_cons;
>>      prod = *(volatile uint32*)&intf->rsp_prod;
>> +    closing = *(volatile uint32*)&intf->closing;
>> +
>> +    if (closing == 1)
> 
> Same again.
> 
>> +            return ERROR_CLOSING;
>> +
>>      xen_mb();
>>      if ( (prod - cons) >= XENSTORE_RING_SIZE )
>>              return 0;
>> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
>> value buffer, value len)
>> 
>>      res = xs_ring_read(GET_C_STRUCT(interface),
>>                         String_val(buffer), Int_val(len));
>> -    if (res == -1)
>> +    if (res == ERROR_UNKNOWN)
>>              caml_failwith("bad connection");
>> +
>> +    if (res == ERROR_CLOSING)
>> +
>>      caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>>      result = Val_int(res);
>>      CAMLreturn(result);
>> }
>> @@ -111,6 +130,64 @@ CAMLprim value ml_interface_write(value interface,
>> value buffer, value len)
>> 
>>      res = xs_ring_write(GET_C_STRUCT(interface),
>>                          String_val(buffer), Int_val(len));
>> +
>> +    if (res == ERROR_CLOSING)
>> +
>>      caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>>      result = Val_int(res);
>>      CAMLreturn(result);
>> }
>> +
>> +CAMLprim value ml_interface_set_client_version(value interface, value v)
>> +{
>> +    CAMLparam2(interface, v);
>> +    struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +    intf->client_version = Int_val(v);
> 
> Why would the server ever set the client version?

This is dead code which is only present for symmetry. It certainly could be 
removed :-)

> 
>> +
>> +    CAMLreturn(Val_unit);
>> +}
>> +
>> +CAMLprim value ml_interface_get_client_version(value interface)
>> +{
>> +    CAMLparam1(interface);
>> +    struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +    CAMLreturn(Val_int (intf->client_version));
>> +}
>> +
>> +CAMLprim value ml_interface_set_server_version(value interface, value v)
>> +{
>> +    CAMLparam2(interface, v);
>> +    struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +    intf->server_version = Int_val(v);
>> +
>> +    CAMLreturn(Val_unit);
>> +}
>> +
>> +CAMLprim value ml_interface_get_server_version(value interface)
>> +{
>> +    CAMLparam1(interface);
>> +    struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +    CAMLreturn(Val_int (intf->server_version));
>> +}
>> +
>> +CAMLprim value ml_interface_close(value interface)
>> +{
>> +    CAMLparam1(interface);
>> +    const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
>> +    struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +    int i;
>> +
>> +    intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
>> 0;
>> +    /* Ensure the unused space is full of invalid xenstore packets. */
>> +    for (i = 0; i < XENSTORE_RING_SIZE; i++) {
>> +            intf->req[i] = invalid_data[i % sizeof(invalid_data)];
>> +            intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
>> +    }
>> +    xen_mb ();
>> +    intf->closing = 0;
>> +    CAMLreturn(Val_unit);
>> +}
>> diff --git a/xen/include/public/io/xs_wire.h
>> b/xen/include/public/io/xs_wire.h
>> index 585f0c8..68460cc 100644
>> --- a/xen/include/public/io/xs_wire.h
>> +++ b/xen/include/public/io/xs_wire.h
>> @@ -104,6 +104,9 @@ enum xs_watch_type
>>     XS_WATCH_TOKEN
>> };
>> 
>> +#define XENSTORE_VERSION_0 0
>> +#define XENSTORE_VERSION_1 1
>> +
>> /*
>>  * `incontents 150 xenstore_struct XenStore wire protocol.
>>  *
>> @@ -112,10 +115,15 @@ enum xs_watch_type
>> typedef uint32_t XENSTORE_RING_IDX;
>> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>> struct xenstore_domain_interface {
>> +    /* XENSTORE_VERSION_0 */
>>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>>     XENSTORE_RING_IDX req_cons, req_prod;
>>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
>> +    uint32_t client_version;
>> +    uint32_t server_version;
> 
> Should any values below perhaps be part of a union so that future versions 
> may replace them, or do you intend that future versions should only extend 
> the set of fields?

Good question, I don’t know what’s best here.

I notice that the ring structure has been constant for a long time; maybe it 
would never change again? Or is that wishful thinking? We’ve also got just 
under 2k of space left, so we could do quite a lot of extending if necessary.

> 
>  Paul
> 
>> +    /* XENSTORE_VERSION_1 */
>> +    uint32_t closing;
>> };
>> 
>> /* Violating this is very bad.  See docs/misc/xenstore.txt. */

(I thought this was an appropriate last line of the diff :-)

Cheers,
Dave
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.