Xen project Mailing List

Re: [Xen-devel] [PATCH v3.1] libxl: Design of an async API to issue QMP commands to QEMU

To: Anthony PERARD <anthony.perard@xxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxx>

Date: Thu, 12 Jul 2018 17:36:00 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Thu, 12 Jul 2018 16:36:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Sorry for not noticing that you had replied. Anthony PERARD writes ("Re: [PATCH v3.1] libxl: Design of an async API to issue QMP commands to QEMU"): > Before replying to this email, here is the description of what state a > QMP connection can be. This states are only internal to libxl_qmp.c, the > rest of libxl doesn't need to know. ... > +/* > + * This are different state possible when the QMP client (libxl) is waiting. > + * Transition from one state to the other is listed after. These are the states of some internal libxl_qmp_something struct ? > + * States: > + * Undefined > + * Disconnected > + * nothing, no allocated ressources > + * Connecting > + * Have allocated ressources, > + * Have connected to the QMP socket, > + * Waiting for server greeting. > + * Capability Negociation > + * QMP server is in Capabilities Negotiation mode. > + * Waiting for a response to the "qmp_capabilities" command > + * Connected > + * QMP server is in command mode, > + * commands can be issued This looks like a plausible set of states. > > > + * Possible states: > > > + * Undefined > > > + * Might contain anything. > > > + * Idle > > > + * Struct contents are defined enough to pass to any > > > + * libxl__qmp_cmd_* functions but is not registered and callback > > > + * will not be called. The struct does not contain references to > > > + * any allocated resources so can be thrown away. > > > + * Active > > > + * Currently waiting for a response from QEMU, and callback can be > > > + * called. _dispose must be called to reclaim resources. > > > > I don't think this set of states is accurate. In particular, your API > > description (about connection management) implies that there are at > > least these states: > > (i) undefined > > (ii) there is no qmp connection open > > (iii) we have sent a command and are expecting a callback > > (iv) there is a qmp connection open, but no command is outstanding > > > > (iv) does not fit into any of the categories above. > > I don't think the state of a qmp connection can fit into > libxl__qmp_cmd_state. The "Active" state doesn't mean that a qmp > connection is open or that the command have been sent. It merely mean > that the syscall socket(3) and connect(3) have return successfully, and > there will be an attempt later to sent the command to qemu. I think you haven't quite understood my point. My understanding of your API is that it gives the user of libl__qmp_cmd a certain amount of visibility of the existence or otherwise of the qmp connection. You say: + * When called from within a callback, the same QMP connection will be + * reused to execute the new command. This is important in the case + * where the first command is "add-fd" and the second command use the + * fdset created by QEMU. This implies that at the top of the callback, the qmp connection is actually in some kind of extra state which is not covered by any of your Undefined, Idle or Active. It is not Undefined, obviously. It is not Active because no response is being awaited any longer. (If a response were being awaited, then it would not be sensible for the callback to issue another command, because your rule is one command at once.) And it is not Idle because it contains references to allocated resources - namely, the qmp connection fd. So your documented state model is too poor to express what is going on. You need to write down at least one additional state, which you might call `Connected'. Also, I have just noticed that you say "from within a callback". That suggests that the code which makes the callback does something to the libxl__qmp_state after after the callback returns. That is contrary to the way that everything else works in libxl. In libxl, such a callback is made as the last thing before returning back to the event loop. The reason is that the state struct (in this case the libxl__qmp_cmd) may be part of a larger struct, which is completed and deallocated somewhere in the series of (nested) callback functions. So the memory may not be valid any more. Another way to look at this is that you actually have a fourth state which I will call WithinCallback. On entry to the callback, the cmd is in WithinCallback state. In WithinCallback state, it is allowed to request that another command is sent (putting the cmd back into Active). But in the WithinCallback state, the libxl__qmp_cmd contains references to resources and may not be freed. Furthermore, as I read your commentary, the WithinCallback state cannot be exited other than to Active, or by returning from the callback. That would make it very awkward for the user of this interface to ever free the cmd. After all, they can only free the memory for it when the state is Idle or Undefined. So they need to get it into Idle which they can only do by returning, but then they have lost control... So this is why I say you should have a proper fourth state, which we can call Connected or something. On entry to the callback, the cmd is in state Connected. The cmd should stay Connected until it is explicitly disposed of. The code which makes the callback then does not need to do anything after making the callback: the user has sent another command; or is going to send another command; or has called _dispose. In any case, the caller has ownership. > > > +/* > > > + * Register a command to be issued to QEMU. > > > > Again, "register" has been inherited from libxl_ev_*. I think it > > would be clearer to say that this function > > > > Sends a command to QEMU. > > The API I'm describing haven't send anything to QEMU yet, the command is > merely put in a queue, and an attempt to send it will be done later, > when QEMU is ready to receive commands and when the socket is not > blocked. Yes. But from the caller's point of view, those are implementation details. How about: Sends a command to QEMU (connecting, etc., as needed) ? > > Up to you. I don't object to mixing the two styles within the same > > facility provided the internal docs are clear. > > I've done the carefd differently because I didn't know how to put it > into the function parameters, beside having a new function. But your > sugestion about having the caller fill the struct sound good. I'll do > that. That way, a caller will know in which states are the different > fields. Right. Do put that in comments too :-). > > > + /* > > > + * This value can be initialise before calling _qmp_cmd_exec. The > > > + * file descriptor will sent to QEMU along with the command, then > > > + * the fd will be closed. > > > + */ > > > + libxl__carefd *efd; > > > > Why not > > libxl__carefd efd; > > ? > > The libxl__carefd_* API returns a newly allocated efd, a pointer. I > can't preallocate it. So it does. Sorry! > > Also, I don't think this description of the semantics is right. The > > caller must always somehow arrange to initialise this value. > > Presumably _init clears it ? Certainly this as a parameter to the > > operation, this should be up with domid and callback. > > > > Maybe you want comments like the ones in libxl__datacopier_state etc., > > which say /* caller must fill these in */. > > > > And, you probably want to make it clear that the fd remains open in > > the libxl process. (I assume it does.) > > Well I was closing the fd once it was sent to QEMU. But we can have the > callbacks takes care of closing it instead. I don't mind what the semantics are, but they should be clear, and all the error cases need to work correctly. Eg if asking to send a fd causes the fd to become owned by the qmp machiner, then if the attempt to send the qmp command fails (maybe becaue the qmp connection fails) then it must be closed. The semantics should probably be whichever are more convenient. Personally i would lean towards qmp_cmd not taking ownership, because if you do then someone who wants to keep a copy of the fd has to dup it and duping a carefd is quite a faff. > > > +libxl__qmp_error_class = Enumeration("qmp_error_class", [ > > > + # No error > > > + (0, "NONE"), > > > + # Error generated by libxl (e.g. socket closed unexpectedly, no mem, > > > ...) > > > + (1, "libxl_error"), > > > + # QMP error classes described in QEMU sources code (QapiErrorClass) > > > + (2, "GenericError"), > > > + (3, "CommandNotFound"), > > > + (4, "DeviceNotActive"), > > > + (5, "DeviceNotFound"), > > > + # Unrecognized QMP error class > > > + (6, "Unknown"), > > > > Are these numbers from qmp ? Why not assign a bunch of libxl error > > values instead ? > > No, these are strings from QEMU, numbers doesn't matter. Also I don't > know how well those are going to fit into libxl errors. I meant, invent a libxl error number (and corresponding ERROR_QMP_BLAH or whatever) for each one of these qmp errors. > (qemu.git:qapi/common.json) > # @QapiErrorClass: > # > # QEMU error classes > # > # @GenericError: this is used for errors that don't require a specific error > # class. This should be the default case for most errors > # > # @CommandNotFound: the requested command has not been found > # > # @DeviceNotActive: a device has failed to be become active > # > # @DeviceNotFound: the requested device has not been found > > QEMU always associate an error messages when it send an error, so the > only thing t odo with GenericError is to log that message. I guess you should alwyas log the error anyway. But discriminating the different qmp errors will probably be useful ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.