Xen project Mailing List

Re: [Xen-devel] [PATCH v6.2 05/11] libxl_qmp: Implementation of libxl__ev_qmp_*

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxx>

Date: Fri, 16 Nov 2018 16:09:03 +0000

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Fri, 16 Nov 2018 16:09:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thanks for the reply. I have deleted all the parts where we are in agreement... Anthony PERARD writes ("Re: [PATCH v6.2 05/11] libxl_qmp: Implementation of libxl__ev_qmp_*"): > On Thu, Nov 15, 2018 at 06:40:13PM +0000, Ian Jackson wrote: > > It would be better if the id column stated something more useful than > > `set'. `next' maybe, where applicable ? > > When 'set', id is the id of the user command we intend to send or have > already sent and waiting for the associated reply. I'm not sure which > word or set of word would describe enough. I might need to add a comment > below the table. You could use `sent', `next', alongside your existing `prev' ? > > Typo `transmitting'. > > > > A minor point, but indenting the rx_* things here would avoid them > > lining up with `transmitting buffer' and misleading the eye. > > > > Better still, have you considered moving this table into the struct > > itself ? You could put the table to the RHS of the actual member > > definitions. That gives slightly fewer places to look, although it > > would involve a cross-reference from this wider state description to > > the field's state tables. > > Indeed, it feels kind of wrong to have the description of the internal > states in two different places. There are some descriptions in the > struct, but those are probably not needed. I don't think I saw any _redundant_ text, but I could be wrong. Certainly the stuff about which parts of the buffer are used, etc., are good and want to be kept. > > ... oh, I see for rx_buf_used this is documented in the struct > > itself. But not for tx_buf_off. > > > > Which leads me to say: the struct contains rx_buf_used but the comment > > here talks about _off. > > :(, I need to fix that. > > And here is the description: > > rx_buf_used: actual data in the buffer > tx_buf_off: data already sent Right, I found that in the end. > > > + * - Allowed internal state transition: > > > + * disconnected -> connecting > > > + * connection -> capability_negotiation > > > + * capability_negotiation/waiting_reply -> connected > > > + * connected -> waiting_reply > > > + * any -> disconnected > > > > This does not mention the state `broken' and it should. > > Maybe: > any -> broken Fine by me. You want to mention broken -> disconnected too :-). > > > +static int qmp_ev_prepare_cmd(libxl__gc *gc, > > > + libxl__ev_qmp *ev, > > > + const char *cmd, > > > + const libxl__json_object *args) ... > > > + ev->id++; > > > + buf = qmp_prepare_cmd(gc, cmd, args, ev->id, &len); > > > + if (!buf) { > > > + return ERROR_FAIL; > > > + } > > > > There is no doc comment for qmp_prepare_cmd saying whether it logs on > > error. So it doesn't ? In which case maybe you should do so here. > > > > Alternatively if it can only really fail due to memory allocation > > failure, maybe it should return void (and call libxl__alloc_failed > > on allocation failure) instead ? > > So we can have the allocation been done by libxl's functions, as > yajl_gen_alloc can use other functions that malloc/free/realloc, that > would remove the allocation failure. Right. > But then I think that qmp_prepare_cmd could fail and return null if > 'cmd' or 'args' contains bad json objects, or a wrong string. > > So we still needs to detect failure. OK. > I log here as I can add the domid. I'm not sure what you mean. Where do you log ? Not here. > > > +/* Setup connection */ > > > + > > > +static int qmp_ev_connect(libxl__gc *gc, libxl__ev_qmp *ev) > > > + /* disconnected -> connecting but with `msg` free > > > + * on error: disconnected > > > + * If the initial state isn't disconnected, then nothing is done */ > > > > Maybe it be better for this function to return `broken' on error ? > > That not the case, but I could change to return broken on error, and let > the caller clean the state. That's what I mean. That might simplify this function and the caller probably has the relevant cleanup path already. (I haven't checked.) > > Also, I think it is anomalous that this function handles `broken' as a > > no-op and claims success. I think that is not really what a caller > > might expect. Maybe you want to forbid `broken' (with a corresponding > > assert) or to permit it (calling dispose before doing the rest of the > > work). > > I'm thinking to change the definition and only allow this function to be > called with the state disconnected, and have the caller check for the > current state. That's another possibility. > > > +out: > > > + libxl__carefd_close(ev->qmp_cfd); > > > + ev->qmp_cfd = NULL; > > > + return rc; > > > > If this function were permitted to leave the state `broken' on error, > > this separate error code would not be needed. Is there some reason > > not to do that ? > > Not really. OK then :-). > > > + if (revents & POLLOUT) { > > > + rc = qmp_ev_callback_writable(gc, ev, fd); > > ... > > > + if (revents & POLLIN) { > > > + rc = qmp_ev_callback_readable(egc, ev, fd); > > > + if (rc) > > > + goto out; > > > + > > > + /* parse input */ > > > > I find it odd that this input parsing is not part of > > qmp_ev_callback_readable. What do you think about moving it there ? > > I wanted to reduce the number of function calls between when a user > callback is called and when ev_qmp's control passes outsite of ev_qmp's > implementation. So I moved qmp_ev_handle_message() call here. Why did you want to reduce the number of function calls ? > > > + /* > > > + * We will send a file descriptor associated with a command on the > > > + * first byte of this command. > > > + */ > > > + if (ev->qmp_state == qmp_state_waiting_reply && > > > + ev->fd >= 0 && > > > + ev->tx_buf_off == 0) { > > > > According to the doc comments, state waiting_reply might have tx_buf > > free, in which case it would try to execute this code. I don't think > > that can be right. > > When tx_buf is free, that function returns earlier. So this code isn't > executed. Oh yes, so it does. But why are we entering this callback with a free tx_buf at all ? If we just return, we are likely to get called again, ie an infinite loop. This can only occur as a result of a bug ? In which case maybe it should be an assert ? > > > + assert(ev->rx_buf_size <= QMP_MAX_SIZE_RX_BUF); > > > + if (ev->rx_buf_size > QMP_MAX_SIZE_RX_BUF) { > > > > I think the assert is redundant. > > > > > + LOGD(ERROR, ev->domid, > > > + "QMP receive buffer is too big (%ld > %lld)", > > > + ev->rx_buf_size, QMP_MAX_SIZE_RX_BUF); > > > + return ERROR_BUFFERFULL; > > > + } > > > > Putting this error check between the change to rx_buf_size and the > > change to rx_buf obscures the essential connection between both > > changes, which must be indivisible. > > > > It also results in a bug: if this error trips, rx_buf_size has been > > updated but rx_buf has not. This is a dangerously invalid state. > > > > This function tries to read everything that qemu produces and calls > > the whole thing a loss of qemu produces more than QMP_MAX_SIZE_RX_BUF. > > But perhaps qemu producing more than QMP_MAX_SIZE_RX_BUF output might > > not be the result of qemu going mad. It might be the result of us > > having ignored it for a while, and qemu filling a buffer with event > > notifications which we might want to discard ? > > > > Would it not be better to process messages as they arrive ? Ie to put > > the attempt to find valid messages here inside this loop ? > > Maybe. > > We don't even needs to wait for a valid message and start parsing. yajl > allow to parse partial json input. But I don't know if it is a good idea > to start doing that. I think parsing partial json messages would be quite complex and not worthwhile - because we expect to get short messages. > But in anycase, it is probably better to look for valide messages before > returning an error with "there's too much to process". Right. > > That might also allow the reduction of the maximum message size from > > 8Mby. 8Mby seems like quite a lot. > > I don't know how big a message can be. I can certainly try to start QEMU > with 256 cpus and check the size of query-cpu's response. That is probably worthwhile but I hope it will be nowhere near 8Mby... > > I wrote: > > > > But I think you should treat only `\n' as the delimiter. This will > > considerably simplify the buffer handling. (You should check and trim > > the preceding `\r' before passing things to libxl__json_parse of > > course.) > > > > But you don't seem to have done that, or replied ? > > This is still simpler than what we had before, and if I look for only > '\n', I would need a loop in cases where \n isn't precedded by \r. I > also check the whole buffer rather than new data, so looking for \r and > check if it is followed by a \n is similar. I think a bare \n is not legal and should be treated as a protocol error. Don't you agree ? Given that, you can search for \n, and if it is not preceded by \r, call it an error. But maybe you prefer it the way you have it. > As for triming the \r, this is done before the libxl__json_parse call. Right, sorry, my parenthetical comment was just saying that if you change it to search for \n you do still need to strip the \r. > > > + case LIBXL__QMP_MESSAGE_TYPE_QMP: > > ... > > > + assert(!ev->tx_buf); > > > + ev->tx_buf = qmp_prepare_cmd(gc, "qmp_capabilities", NULL, > > > + QMP_CAPABILITY_NEGOTIATION_MSGID, > > > + &ev->tx_buf_len); > > > + ev->tx_buf_off = 0; > > > > Why do you not use qmp_ev_prepare_cmd for this ? > > Because qmp_ev_prepare_cmd does more like increasing the id. This > qmp_capabilities is a special command that is part of the > initialisation. I need to know which id is used so that I can track it > later and found out when QEMU is ready to receive other commands. But see below about the id. I think you should be using the normal id. In another thread you were talking about possibly trying to reuse an existing session and putting the pid in the id or something. In this case a fixed id for the capabilities command is not good either. Let me bring forward the part where we handle the resuls: > > > + id = libxl__json_object_get_integer(o); > > > + > > > + if (id == QMP_CAPABILITY_NEGOTIATION_MSGID) { > > > + /* We have a response to our qmp_capabilities cmd */ > > > + if (ev->qmp_state != qmp_state_capability_negotiation || > > > + type != LIBXL__QMP_MESSAGE_TYPE_RETURN) > > > + goto out_unknown_id; > > > + qmp_ev_set_state(gc, ev, qmp_state_connected); > > > + return 0; > > > + } > > > > I'm a bit puzzled about the capability negotation, its id, etc.: > > > > * Why does QMP_CAPABILITY_NEGOTIATION_MSGID exist at all ? You > > could just use a normal id, couldn't you ? And you'd be able to > > tell from your own state that it was the right value. > > But I would then needs to track two different ids, one for the > qmp_capabilities command, and one for the user's command. But you only construct the id when you prepare the command into tx_buf. So you only have one id at a time: the id of the most recently prepared command. Or am I wrong ? ... I have looked at this code again and I was wrong. You format the user's message, with id, right at the beginning. (The sequencing of events in the code structure is slightly odd, in that the capabilities message is formatted after the user's message, but sent beforehand.) I still think for the reasons above that you probably can't have a fixed QMP_CAPABILITY_NEGOTIATION_MSGID because we may want to be able to reuse an old transport connection in the future. So that means you do need two ids. (I think you can't defer formatting the user's message, baking an id into it, because the lifetime of the callers' json object is too short for you to save it.) I'm not sure I can see sensible a way of doing this that doesn't have *three* id variables: - a counter for generating new ids - the id put in the capabilities command - the id put in the user's command > We could call qmp_ev_prepare_cmd, qmp_ev_prepare_user_cmd. > > > > + case LIBXL__QMP_MESSAGE_TYPE_RETURN: > > > + case LIBXL__QMP_MESSAGE_TYPE_ERROR: > > ... > > > + o = libxl__json_map_get("id", resp, JSON_INTEGER); > > > + if (!o) { > > > + /* > > > + * If "id" isn't present, an error occur on the server before > > > + * it has read the "id" provided by libxl. > > > + */ > > > + qmp_ev_parse_error_messages(egc, ev, resp); > > > + return ERROR_PROTOCOL_ERROR_QMP; > > > > I think you wanted to return the error code from > > qmp_ev_parse_error_messages ? > > Yeah, but then I've added to the public documentation that ERROR_QMP_* > means that ev_qmp is still Connected. Oh. Right. I see. Now that you have explained I think the code is correct. Can you add a comment about this ? Something like this: /* * Deliberately squash all errors into ERROR_PROTOCOL_ERROR_QMP. * qmp_ev_parse_error_messages will only return that, or * ERROR_QMP_*; but ERROR_QMP_* is reserved for errors resulting * from the caller's command, and when we return that we promise * that the ev_qmp is Connected. */ > > * Why are we even doing capability negotation when we throw the answer > > away ? > > It is part of the protocol. > > And qmp_capabilities doesn't return anything (just an empty object). Oh, is the purpose to inform qemu what our capabilities are ? > > * I think it would be better to check id == ev->id first, rather > > than in each of these branches. > > I don't think I follow. id==ev->id is only checked once. You check id against QMP_CAPABILITY_NEGOTIATION_MSGID separately from the check against ev->id. But now that I see you have formatted both commands and need to have both ids, I'm not sure that comment of mine makes sense. > I don't think all ERROR should be protocol error. To me, it is fine if a > command returns an error, then tell the user about it. Also, there is no > needs to tear down the connection in this case. Sorry, yes. > As for the sequence of checks, I can try to follow it, it might look > weird because there is two differents ids to check, which id is valide > depends on the current state. I think you are probably right about that. Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.