[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
Hi Paul, On 4 Jul 2014, at 09:21, Paul Durrant <Paul.Durrant@xxxxxxxxxx> wrote: >> -----Original Message----- >> From: David Scott [mailto:dave.scott@xxxxxxxxxx] >> Sent: 03 July 2014 16:15 >> To: xen-devel@xxxxxxxxxxxxxxxxxxxx >> Cc: Paul Durrant; Ian Campbell; john.liuqiming@xxxxxxxxxx; >> yanqiangjun@xxxxxxxxxx; Andrew Cooper; Dave Scott >> Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal >> >> Currently hvmloader uses the xenstore ring and then tries to >> reset it back to its initial state. This is not part of the >> ring protocol and, if xenstored reads the ring while it is >> happening, xenstored will conclude it is corrupted. A corrupted >> ring will prevent PV drivers from connecting. This seems to >> be a rare failure. >> >> Furthermore, when a VM crashes it may jump to a 'crash kernel' >> to create a diagnostic dump. Without the ability to safely >> reset the ring the PV drivers won't be able to reliably >> establish connections, to (for example) stream a memory dump to >> disk. >> >> This prototype patch contains a simple extension of the >> xenstore ring structure, enough to contain version numbers and >> a 'closing' flag. This memory is currently unused within the >> 4k page and should be zeroed as part of the domain setup >> process. The oxenstored server advertises version 1, and >> hvmloader detects this and sets the 'closing' flag. The server >> then reinitialises the ring, filling it with obviously invalid >> data to help debugging (unfortunately blocks of zeroes are >> valid xenstore packets) and signals hvmloader by the event >> channel that it's safe to boot the guest OS. >> >> Updates since v2 (thanks to Andy Cooper for the review): >> * hvmloader: use volatile for read of closing flag >> * style improvements >> * remove xenstore version #defines >> >> Updates since v1 (thanks to Paul Durrant for the review): >> * remove unused client version and associated dead code >> * treat 'closing' as a flag by using "!=0" rather than "==1" >> * hvmloader: move function up and remove forward decl >> * document the existing xenstore ring and the extention in misc/ >> >> Signed-off-by: David Scott <dave.scott@xxxxxxxxxx> >> --- >> docs/misc/xenstore-ring.txt | 79 >> +++++++++++++++++++++++++++++++++++ >> tools/firmware/hvmloader/xenbus.c | 39 +++++++++-------- >> tools/ocaml/libs/xb/xb.ml | 26 +++++++++++- >> tools/ocaml/libs/xb/xb.mli | 1 + >> tools/ocaml/libs/xb/xs_ring.ml | 11 +++++ >> tools/ocaml/libs/xb/xs_ring_stubs.c | 63 >> +++++++++++++++++++++++++++- >> xen/include/public/io/xs_wire.h | 5 +++ >> 7 files changed, 203 insertions(+), 21 deletions(-) >> create mode 100644 docs/misc/xenstore-ring.txt >> >> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt >> new file mode 100644 >> index 0000000..df2a09f >> --- /dev/null >> +++ b/docs/misc/xenstore-ring.txt >> @@ -0,0 +1,79 @@ >> + >> +The domain builder allocates a single page and shares it with the xenstore >> +daemon. This document describes the ring protocol used to communicate >> via >> +this page which is used to transmit and receive >> +[xenstore protocol packets](xenstore.txt). >> + >> +In the original version (we call this "version 0"), the shared page has the >> +following contents (all offsets and lengths are in octets): >> + >> +Offset Length Description >> +----------------------------------------------------------------- >> +0 1024 Requests: data sent to the xenstore daemon >> +1024 1024 Replies: data sent to the domain >> +2048 4 Request consumer offset >> +2052 4 Request producer offset >> +2056 4 Response consumer offset >> +2060 4 Response producer offset >> + >> +When the page is allocated it is guaranteed to be full of zeroes. >> + >> +The "Requests" and "Replies" are treated as circular buffers, one for >> +each direction. Each circular buffer is associated wth a producer and >> +consumer offset, which are free-running counters starting from 0. A >> "producer" >> +offset is the offset in the byte stream of the next byte to be written; a >> +"consumer" offset is the offset in the byte stream of the next byte to be >> +read. The byte at offset 'x' in the byte stream will be stored in >> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives >> +the number of bytes still to be read, and "1024 - (producer - consumer)" >> +therefore gives the amount of space currently available for writing, >> +where we must avoid overwriting unread data. >> + >> +The circular buffers are only used to send sequences of bytes between >> domains. >> +It is the responsibility of the layer above to segment these bytes into >> +packets, as described in [xenstore.txt](xenstore.txt). >> + >> +The client and server domains can run concurrently on different cores and >> +different sockets. We must therefore take care to avoid corruption by: >> + >> + 1. using atomic loads and stores when reading and writing the producer >> + and consumer offsets. If an offset were to be updated by a non-atomic >> + store then the other domain may read an invalid offset value. >> + 2. ensuring request and reply data is fully read or written before >> + updating the offsets by issuing a memory barrier. >> + >> +If we wish to read data but find the circular buffer empty, or wish to write >> +data and find the circular buffer full, we wait on a pre-arranged event >> +channel. When the other party next reads or writes data to the ring, it >> +will notify() this event channel and we will wake. >> + >> +Protocol extension for reconnection >> +----------------------------------- >> + >> +In version 0 of the protocol it is not possible to close and reopen the >> +connection. This means that if the ring is corrupted, it can never be used >> +(safely) again. Extending the protocol to allow reconnection would allow >> +us to: >> + >> + 1. use the ring in the firmware (hvmloader) and safely reset it for use >> + by the guest >> + 2. re-establish a ring in a 'crash kernel' environment, allowing us to >> + write crash dumps to PV disks or network devices. >> + >> +In version 1 of the protocol the ring is extended with the following >> +fields: >> + >> +Offset Length Description >> +----------------------------------------------------------------- >> +2064 4 Xenstore daemon supported protocol version >> +2068 4 Close request flag >> + >> +In a system supporting only version 0, both these fields are guaranteed >> +to contain 0 by the domain builder. >> + >> +In a system supporting version 1, the xenstore daemon will write "1" into >> +the support protocol version field. The guest xenstore client (eg in > > s/support/supported > >> +hvmloader) can query the version, and if it is set to "1" it can write >> +"1" to the close request flag and call notify(). The supporting xenstore > > Perhaps, rather than 'call notify()' this should be 'signal the store event > channel'? > >> +daemon can reset the ring to an empty state and signal completion by >> +clearing the flag and calling notify() again. > > Same here. > >> diff --git a/tools/firmware/hvmloader/xenbus.c >> b/tools/firmware/hvmloader/xenbus.c >> index fe72e97..f85832c 100644 >> --- a/tools/firmware/hvmloader/xenbus.c >> +++ b/tools/firmware/hvmloader/xenbus.c >> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* >> Shared ring with dom0 */ >> static evtchn_port_t event; /* Event-channel to dom0 */ >> static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area >> */ >> >> +static void ring_wait(void) >> +{ >> + struct shared_info *shinfo = get_shared_info(); >> + struct sched_poll poll; >> + >> + memset(&poll, 0, sizeof(poll)); >> + set_xen_guest_handle(poll.ports, &event); >> + poll.nr_ports = 1; >> + >> + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) ) >> + hypercall_sched_op(SCHEDOP_poll, &poll); >> +} >> + >> /* Connect our xenbus client to the backend. >> * Call once, before any other xenbus actions. */ >> void xenbus_setup(void) >> @@ -68,10 +81,15 @@ void xenbus_shutdown(void) >> >> ASSERT(rings != NULL); >> >> - /* We zero out the whole ring -- the backend can handle this, and it's >> - * not going to surprise any frontends since it's equivalent to never >> - * having used the rings. */ >> - memset(rings, 0, sizeof *rings); >> + if (rings->server_version > 0) { >> + rings->closing = 1; >> + while (*(volatile uint32_t*)&rings->closing == 1) >> + ring_wait (); >> + } else >> + /* If the backend reads the state while we're erasing it then the >> + * ring state will become corrupted, preventing guest frontends from >> + * connecting. This is rare. */ >> + memset(rings, 0, sizeof *rings); >> >> /* Clear the event-channel state too. */ >> memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info)); >> @@ -81,19 +99,6 @@ void xenbus_shutdown(void) >> rings = NULL; >> } >> >> -static void ring_wait(void) >> -{ >> - struct shared_info *shinfo = get_shared_info(); >> - struct sched_poll poll; >> - >> - memset(&poll, 0, sizeof(poll)); >> - set_xen_guest_handle(poll.ports, &event); >> - poll.nr_ports = 1; >> - >> - while ( !test_and_clear_bit(event, shinfo->evtchn_pending) ) >> - hypercall_sched_op(SCHEDOP_poll, &poll); >> -} >> - >> /* Helper functions: copy data in and out of the ring */ >> static void ring_write(const char *data, uint32_t len) >> { >> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml >> index 29d354d..d5cd776 100644 >> --- a/tools/ocaml/libs/xb/xb.ml >> +++ b/tools/ocaml/libs/xb/xb.ml >> @@ -84,7 +84,26 @@ let write con s len = >> | Fd backfd -> write_fd backfd con s len >> | Xenmmap backmmap -> write_mmap backmmap con s len >> >> -let output con = >> +(* If a function throws Xs_ring.Closing, then clear the ring state >> + and serve the ring again. *) >> +let rec handle_closing f con = >> + match (try Some (f con) with Xs_ring.Closing -> None) with >> + | Some x -> x >> + | None -> >> + begin match con.backend with >> + | Fd _ -> raise Xs_ring.Closing (* should never happen, but >> just in case *) >> + | Xenmmap backend -> >> + Xs_ring.close backend.mmap; >> + backend.eventchn_notify (); >> + (* Clear our old connection state *) >> + Queue.clear con.pkt_in; >> + Queue.clear con.pkt_out; >> + con.partial_in <- init_partial_in (); >> + con.partial_out <- ""; >> + handle_closing f con >> + end >> + >> +let output = handle_closing (fun con -> >> (* get the output string from a string_of(packet) or partial_out *) >> let s = if String.length con.partial_out > 0 then >> con.partial_out >> @@ -101,8 +120,9 @@ let output con = >> ); >> (* after sending one packet, partial is empty *) >> con.partial_out = "" >> +) >> >> -let input con = >> +let input = handle_closing (fun con -> >> let newpacket = ref false in >> let to_read = >> match con.partial_in with >> @@ -133,6 +153,7 @@ let input con = >> HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf) >> ); >> !newpacket >> +) >> >> let newcon backend = { >> backend = backend; >> @@ -145,6 +166,7 @@ let newcon backend = { >> let open_fd fd = newcon (Fd { fd = fd; }) >> >> let open_mmap mmap notifyfct = >> + Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *) >> newcon (Xenmmap { >> mmap = mmap; >> eventchn_notify = notifyfct; >> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli >> index 58234ae..7f65fa3 100644 >> --- a/tools/ocaml/libs/xb/xb.mli >> +++ b/tools/ocaml/libs/xb/xb.mli >> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int >> val write_fd : backend_fd -> 'a -> string -> int -> int >> val write_mmap : backend_mmap -> 'a -> string -> int -> int >> val write : t -> string -> int -> int >> +val handle_closing : (t -> 'a) -> t -> 'a >> val output : t -> bool >> val input : t -> bool >> val newcon : backend -> t >> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml >> index 9469609..d7f0fd4 100644 >> --- a/tools/ocaml/libs/xb/xs_ring.ml >> +++ b/tools/ocaml/libs/xb/xs_ring.ml >> @@ -14,5 +14,16 @@ >> * GNU Lesser General Public License for more details. >> *) >> >> +exception Closing >> + >> +let _ = >> + Callback.register_exception "Xs_ring.Closing" Closing >> + >> external read: Xenmmap.mmap_interface -> string -> int -> int = >> "ml_interface_read" >> external write: Xenmmap.mmap_interface -> string -> int -> int = >> "ml_interface_write" >> + >> + >> +external set_server_version: Xenmmap.mmap_interface -> int -> unit = >> "ml_interface_set_server_version" "noalloc" >> +external get_server_version: Xenmmap.mmap_interface -> int = >> "ml_interface_get_server_version" "noalloc" >> + >> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" >> "noalloc" >> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c >> b/tools/ocaml/libs/xb/xs_ring_stubs.c >> index 8bd1047..27c98cd 100644 >> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c >> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c >> @@ -35,19 +35,28 @@ >> >> #define GET_C_STRUCT(a) ((struct mmap_interface *) a) >> >> +#define ERROR_UNKNOWN (-1) >> +#define ERROR_CLOSING (-2) >> + >> static int xs_ring_read(struct mmap_interface *interface, >> char *buffer, int len) >> { >> struct xenstore_domain_interface *intf = interface->addr; >> XENSTORE_RING_IDX cons, prod; /* offsets only */ >> int to_read; >> + uint32_t closing; >> >> cons = *(volatile uint32*)&intf->req_cons; >> prod = *(volatile uint32*)&intf->req_prod; >> + closing = *(volatile uint32*)&intf->closing; >> + >> + if (closing != 0) >> + return ERROR_CLOSING; >> + >> xen_mb(); >> >> if ((prod - cons) > XENSTORE_RING_SIZE) >> - return -1; >> + return ERROR_UNKNOWN; >> >> if (prod == cons) >> return 0; >> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface >> *interface, >> struct xenstore_domain_interface *intf = interface->addr; >> XENSTORE_RING_IDX cons, prod; >> int can_write; >> + uint32_t closing; >> >> cons = *(volatile uint32*)&intf->rsp_cons; >> prod = *(volatile uint32*)&intf->rsp_prod; >> + closing = *(volatile uint32*)&intf->closing; >> + >> + if (closing != 0) >> + return ERROR_CLOSING; >> + >> xen_mb(); >> if ( (prod - cons) >= XENSTORE_RING_SIZE ) >> return 0; >> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface, >> value buffer, value len) >> >> res = xs_ring_read(GET_C_STRUCT(interface), >> String_val(buffer), Int_val(len)); >> - if (res == -1) >> + if (res == ERROR_UNKNOWN) >> caml_failwith("bad connection"); >> + >> + if (res == ERROR_CLOSING) >> + > > This looks a bit wrong. The blank line above should be removed and the > following line indented appropriately. > >> caml_raise_constant(*caml_named_value("Xs_ring.Closing")); >> + >> result = Val_int(res); >> CAMLreturn(result); >> } >> @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface, >> value buffer, value len) >> >> res = xs_ring_write(GET_C_STRUCT(interface), >> String_val(buffer), Int_val(len)); >> + >> + if (res == ERROR_CLOSING) >> + > > Same here. > >> caml_raise_constant(*caml_named_value("Xs_ring.Closing")); >> + >> result = Val_int(res); >> CAMLreturn(result); >> } >> + >> +CAMLprim value ml_interface_set_server_version(value interface, value v) >> +{ >> + CAMLparam2(interface, v); >> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)- >>> addr; >> + >> + intf->server_version = Int_val(v); >> + >> + CAMLreturn(Val_unit); >> +} >> + >> +CAMLprim value ml_interface_get_server_version(value interface) >> +{ >> + CAMLparam1(interface); >> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)- >>> addr; >> + >> + CAMLreturn(Val_int (intf->server_version)); >> +} >> + >> +CAMLprim value ml_interface_close(value interface) >> +{ >> + CAMLparam1(interface); >> + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' }; >> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)- >>> addr; >> + int i; >> + >> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = >> 0; >> + /* Ensure the unused space is full of invalid xenstore packets. */ >> + for (i = 0; i < XENSTORE_RING_SIZE; i++) { >> + intf->req[i] = invalid_data[i % sizeof(invalid_data)]; >> + intf->rsp[i] = invalid_data[i % sizeof(invalid_data)]; >> + } > > This does not reset the rings to their initial state (i.e. all zeroes). I > don't think that's necessarily a problem, but it is inconsistent. Indeed — I should document this difference in the API. In one case where I saw the original bug the xenstore logs were full of ‘DEBUG’ messages, because message id = 0 = DEBUG, request id = 0, transaction id = 0 and length = 0 is a valid packet. This invalid data should cause the corruption to be noticed more quickly :-) Thanks, Dave > >> + xen_mb (); >> + intf->closing = 0; >> + CAMLreturn(Val_unit); >> +} >> diff --git a/xen/include/public/io/xs_wire.h >> b/xen/include/public/io/xs_wire.h >> index 585f0c8..022d614 100644 >> --- a/xen/include/public/io/xs_wire.h >> +++ b/xen/include/public/io/xs_wire.h >> @@ -108,14 +108,19 @@ enum xs_watch_type >> * `incontents 150 xenstore_struct XenStore wire protocol. >> * >> * Inter-domain shared memory communications. */ >> + > > Pure whitespace change. Not necessary. > > Paul > >> #define XENSTORE_RING_SIZE 1024 >> typedef uint32_t XENSTORE_RING_IDX; >> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1)) >> struct xenstore_domain_interface { >> + /* XENSTORE_VERSION_0 */ >> char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */ >> char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */ >> XENSTORE_RING_IDX req_cons, req_prod; >> XENSTORE_RING_IDX rsp_cons, rsp_prod; >> + uint32_t server_version; >> + /* server_version 1 and later: */ >> + uint32_t closing; /* Non-zero means close in progress */ >> }; >> >> /* Violating this is very bad. See docs/misc/xenstore.txt. */ >> -- >> 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |