[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal

On 22 Sep 2014, at 17:38, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:

> On 3 Sep 2014, at 17:25, David Scott <dave.scott@xxxxxxxxxx> wrote:
>> Hvmloader uses the xenstore ring and then tries to reset it back
>> to its initial state before booting the guest. Occasionally xenstored
>> will read the ring while it is being zeroed and conclude it has
>> been corrupted. This prevents PV drivers from loading in the guest.
> This protocol doc is almost perfect - see below.  I see that Ian C has
> reviewed the C implementation.  Have you found anyone to provide a 2nd
> opinion on the ocaml ?

I’ve cc:d Jon Ludlam. Jon: do you have time to review OCaml part of the patch?

>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> +The xenstore server may decide to advertise some features via the
>> +"Server feature bitmap". The server can start advertising features
>> +at any time by setting bits but it will never stop advertising features
>> +i.e. bits will never be cleared. The guest is not permitted to write to
>> +the server feature bitmap. The server features are offered to the guest;
>> +it is up to the guest whether to use them or not. The guest should ignore
>> +any unknown feature bits.
> The rule that the server may start advertising features at any time is
> reasonable but it needs to be accompanied by a promise, for any
> specific feature, of the latest time at which the server will normally
> advertise it.  Otherwise the client won't know when to check it and
> whether it can sensibly cache the result.
>> +Mask    Description
>> +-----------------------------------------------------------------
>> +1       Ring reconnection (see the ring reconnection feature below)
>> +
>> +The "Connection state" field is used to request a ring close and reconnect.
>> +The "Connection state" field only contains valid data if the server has
>> +advertised the ring reconnection feature. If the feature has been advertised
>> +then the "Connection state" may take the following values:
> For example, it would be useful to say here that this feature is
> normally advertised from the start of the guest's operation.


>> +The ring reconnection feature
>> +=============================
>> +
>> +The ring reconnection feature allows the guest to ask the server to
>> +reset the ring to a valid initial state i.e. where the Input and Output
>> +queues contain no data. The feature operates at the ring-level only;
>> +it does not affect the higher-level protocol state machine at all.
>> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
>> +command in the [xenstore protocol](xenstore.txt) specification.
>> +
>> +The ring reconnection feature is only available if the 'Ring reconnection'
>> +feature bit has been set by the server in the "Server feature bitmap".
>> +
>> +Assuming the server has advertised the feature, the guest can initiate
>> +a reconnection by setting the the Connection state to 1 ("Ring close
>> +and reconnect is in progress") and signalling the event channel.
>> +The guest must now ignore all fields except the Connection state and
>> +wait for it to be set to 0 ("Ring is connected")
>> +
>> +The server will guarantee to
>> +
>> +  - drop any partially read or written higher-level
>> +    [xenstore protocol](xenstore.txt) packets it may have;
>> +  - empty the Input and Output queues in the xenstore ring;
>> +  - set the Connection state to 0 ("Ring is connected"); and
>> +  - signal the event channel.
> I think the server needs to guarantee that there are no outstanding
> commands, somehow.  As you write this, the server might have a command
> being processed which is no longer in the Input queue but whose
> response has not yet arrived in the Output queue.
> You might say that the client can use the response to RESET_WATCHES to
> distinguish replies to old commands (before the RESET_WATCHES reply)
> from replies to new ones.  But (a) the server does not guarantee to
> process messages in order and (b) the client might get unlucky and use
> the same request id for its RESET_WATCHES as one of the `old'
> commands.

I see what you mean. For a ‘clean’ reconnect like hvmloader we can be
sure that there are no outstanding requests because hvmloader has already
received all the replies. For ‘unclean’ reconnects like a crash-kernel
start there is definitely a problem.

Since the request ids are local to the connection, and the connection has
just closed and re-opened, I think it makes sense for the server to drop
all old request ids. The client would then be able to safely use any
request id for the RESET_WATCHES without fearing a old reply coming back.

It would be even nicer from a client perspective (although I’m not sure
how this would affect the diff) if the ring connection re-open performed
a RESET_WATCHES automatically, since then it would act the same as closing
and re-opening your Unix domain socket.


> I assume (without checking) that there is a suitable guarantee which
> could be written down and which is actually implemented by both
> existing xenstored implementations...
> Thanks,
> Ian.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.