[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream



Andrew Cooper writes ("[PATCH v2 16/27] tools/libxl: Infrastructure for reading 
a libxl migration v2 stream"):
> From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
> 
> This contains the event machinary and state machines to read an act on a
> non-checkpointed migration v2 stream (with the exception of the
> xc_domain_restore() handling which is spliced later in a bisectable way).
> 
> It also contains some boilerplate to help support checkpointed streams, which
> shall be introduced in a later patch.
...


> +/* State for manipulating a libxl migration v2 stream */
> +typedef struct libxl__stream_read_state libxl__stream_read_state;
...
> +struct libxl__stream_read_state {
> +    /* filled by the user */
> +    libxl__ao *ao;
> +    int fd;
> +    void (*completion_callback)(libxl__egc *egc,
> +                                libxl__stream_read_state *srs,
> +                                int rc);
> +    /* Private */
> +    int rc;
> +    bool running;
> +
> +    /* Main stream-reading data */
> +    libxl__datacopier_state dc;
> +    libxl__sr_hdr hdr;
> +    libxl__sr_record_buf *incoming_record;
> +    LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue;

This comes from malloc, not from the gc.  That needs a comment.  (Is
the same true of incoming_record ?  I think it is.)

> +    enum {
> +        SRS_PHASE_NORMAL,
> +    } phase;
> +    bool recursion_guard;
> +
> +    /* Emulator blob handling */
> +    libxl__datacopier_state emu_dc;
> +    libxl__carefd *emu_carefd;
> +};
...

The comments in libxl_stream_read.c are good but I'm afraid I'm going
to ask for some commentary about the legal states and/or the
invariants in the data structure.

That is, you've written a state machine but the states are not
explicitly listed.

As far as I can tell, from the outside, this machinery has the usual
Undefined/Idle/Active states.  But does it not have more states on the
inside ?  PHASE at least needs explaining.  See also comments about
the record queue, below.

For an example of the kind of thing I mean, see libxl_spawn.
Particularly, libxl_internal.h:

  * Each libxl__spawn_state is in one of these states
  *    Undefined, Idle, Attached, Detaching

and the much more detailed state and data structure table in
libxl_exec.c

 * Full set of possible states of a libxl__spawn_state and its _detachable:
   
I think you probably don't need to write down the external states for
your srs machinery because the states are Undefined, Idle and Active
like most things.  (Assuming you do something about the anomalous
initialisation and checking of `running'.)

What I'm missing is the internal description.


> +/*
> + * Infrastructure for reading and acting on the contents of a libxl migration
> + * stream. There are a lot of moving parts here.

(Can you wrap to 70 or at least 75?)

Thanks.


> +void libxl__stream_read_start(libxl__egc *egc,
> +                              libxl__stream_read_state *stream)
> +{
> +    libxl__datacopier_state *dc = &stream->dc;
> +    int ret = 0;
> +
> +    /* State initialisation. */
> +    assert(!stream->running);

This is being discussed apropos of Ian C's review.

> +    memset(dc, 0, sizeof(*dc));

Should be FILLZERO.

> +    /* Start reading the stream header. */
> +    ret = setup_read(stream, "stream header",
> +                     &stream->hdr, sizeof(stream->hdr),
> +                     stream_header_done);
> +    if (ret)
> +        goto err;

`ret' should be `rc'.

This needs to be changed throughout.

> +    stream->running = true;
> +    stream->phase = SRS_PHASE_NORMAL;
> +    LIBXL_STAILQ_INIT(&stream->record_queue);
> +    stream->recursion_guard = 0;

These initialisations-to-empty need to be moved to before any part of
this function which might fail.

As it is, if setup_read fails I think you end up reading from the
uninitialised stream->record_queue, in stream_done.  (In your actual
code this does not matter because the caller did a FILLZERO but you
should not be relying on that.)

> +void libxl__stream_read_abort(libxl__egc *egc,
> +                              libxl__stream_read_state *stream, int rc)
> +{
> +    stream_failed(egc, stream, rc);

This has the same signature as stream_failed.  Perhaps stream_failed
could be eliminated and replaced with this ?  Or maybe...

> +static void stream_success(libxl__egc *egc, libxl__stream_read_state *stream)
> +{
> +    stream->rc = 0;
> +    stream_done(egc, stream);

This function has one call site.  As an alternative to abolishing
stream_failed, maybe stream_failed should be renamed to
`stream_complete' and then the call site for stream_success could
simply say `stream_complete(egc, stream, 0)' ?


> +static void stream_failed(libxl__egc *egc,
> +                          libxl__stream_read_state *stream, int rc)
> +{
> +    assert(rc);
> +    stream->rc = rc;
> +
> +    if (stream->running) {
> +        stream_done(egc, stream);

Is the check for running necessary here ?

Not checking, and instead relying on all the individual parts to be
properly initialised, makes the correctness clearer.


> +static void stream_header_done(libxl__egc *egc,
> +                               libxl__datacopier_state *dc,
> +                               int rc, int onwrite, int errnoval)

I think it would be clearer if this function was immediately after
libxl_stream_read_start.  Probably moving libxl_stream_read_start is
the right answer.

Also, putting stream_continue after stream_header_done might help.

I appreciate that these functions form a loop so can't be strictly in
execution order, but it is much easier to read if the first iteration
is in the right order.  As CODING_STYLE says:

  The code for asynchronous operations should be laid out in
  chronological order.  That is, where there is a chain of callback
  functions, each subsequent function should be, textually, the next
  function in the file.  This will normally involve predeclaring the
  callback functions.  Synchronous helper functions should be separated
  out into a section preceding the main callback chain.

AFAICT there are a few other out-of-order functions.


> +    int ret = 0;
> +
> +    if (rc || onwrite || errnoval) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval);
> +        goto err;
> +    }
> +
> +    hdr->ident   = be64toh(hdr->ident);
> +    hdr->version = be32toh(hdr->version);
> +    hdr->options = be32toh(hdr->options);
> +
> +    if (hdr->ident != RESTORE_STREAM_IDENT) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR,
> +            "Invalid ident: expected 0x%016"PRIx64", got 0x%016"PRIx64,
> +            RESTORE_STREAM_IDENT, hdr->ident);
> +        goto err;
> +    }
> +    if (hdr->version != RESTORE_STREAM_VERSION) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "Unexpected Version: expected %u, got %u",
> +            RESTORE_STREAM_VERSION, hdr->version);
> +        goto err;
> +    }
> +    if (hdr->options & RESTORE_OPT_BIG_ENDIAN) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "Unable to handle big endian streams");
> +        goto err;
> +    }
> +
> +    LOG(DEBUG, "Stream v%u%s", hdr->version,
> +        hdr->options & RESTORE_OPT_LEGACY ? " (from legacy)" : "");
> +
> +    stream_continue(egc, stream);
> +    return;
> +
> + err:
> +    assert(ret);
> +    stream_failed(egc, stream, ret);
> +}
> +
> +static void setup_read_record(libxl__egc *egc,
> +                              libxl__stream_read_state *stream)
> +{
> +    STATE_AO_GC(stream->ao);
> +    libxl__sr_record_buf *rec = NULL;
> +    int ret;
> +
> +    assert(stream->incoming_record == NULL);
> +
> +    stream->incoming_record = rec = libxl__zalloc(NOGC, sizeof(*rec));
> +    ret = setup_read(stream, "record header",
> +                     &rec->hdr, sizeof(rec->hdr),
> +                     record_header_done);
> +    if (ret)
> +        goto err;
> +
> +    return;
> +
> + err:
> +    assert(ret);
> +    stream_failed(egc, stream, ret);
> +}
> +
> +static void record_header_done(libxl__egc *egc,
> +                               libxl__datacopier_state *dc,
> +                               int rc, int onwrite, int errnoval)
> +{
> +    libxl__stream_read_state *stream = CONTAINER_OF(dc, *stream, dc);
> +    libxl__sr_record_buf *rec = stream->incoming_record;
> +    STATE_AO_GC(dc->ao);
> +    int ret = 0;
> +
> +    if (rc || onwrite || errnoval) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval);
> +        goto err;
> +    }
> +
> +    /* No body? All done. */
> +    if (rec->hdr.length == 0) {
> +        record_body_done(egc, dc, 0, 0, 0);
> +        return;
> +    }
> +
> +    size_t bytes_to_read = ROUNDUP(rec->hdr.length, REC_ALIGN_ORDER);
> +    rec->body = libxl__malloc(NOGC, bytes_to_read);
> +
> +    ret = setup_read(stream, "record body",
> +                     rec->body, bytes_to_read,
> +                     record_body_done);
> +    if (ret)
> +        goto err;
> +
> +    return;
> +
> + err:
> +    assert(ret);
> +    stream_failed(egc, stream, ret);
> +}
> +
> +static void record_body_done(libxl__egc *egc,
> +                             libxl__datacopier_state *dc,
> +                             int rc, int onwrite, int errnoval)
> +{
> +    libxl__stream_read_state *stream = CONTAINER_OF(dc, *stream, dc);
> +    libxl__sr_record_buf *rec = stream->incoming_record;
> +    STATE_AO_GC(dc->ao);
> +    int ret = 0;
> +
> +    if (rc || onwrite || errnoval) {
> +        ret = ERROR_FAIL;
> +        LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval);
> +        goto err;
> +    }
> +
> +    LIBXL_STAILQ_INSERT_TAIL(&stream->record_queue, rec, entry);
> +    stream->incoming_record = NULL;
> +
> +    stream_continue(egc, stream);
> +    return;
> +
> + err:
> +    assert(ret);
> +    stream_failed(egc, stream, ret);
> +}
> +
> +/*
> + * Returns a boolean indicating whether a further action should be set up by
> + * the caller.  This is needed to prevent mutual recursion with
> + * stream_continue().
> + */

I don't understand why this mutual recursion would be a problem.

AFAICT stream_continue called from process_record ought only to call
process_record if record_queue had more than two items on it.

But if record_queue has more than two items on it, stream_continue
does the wrong thing already.

So err, why is stream_continue not a loop; or, why is record_queue a
queue ?

If the state machine had better doc comments, stating the invariants,
I would perhaps not be asking these foolish questions...

> +static void write_emulator_blob(libxl__egc *egc,
> +                                libxl__stream_read_state *stream,
> +                                libxl__sr_record_buf *rec)
> +{
...
> +
> +    if ( rec->hdr.length < sizeof(*emu_hdr) ) {

Coding style (whitespace)


> +    sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE".%u", dcs->guest_domid);
> +
> +    libxl__carefd_begin();
> +    writefd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0600);

Does this need to be a carefd ?  This is going to be a small amount of
data, and the worst that happens is some other process inherits an fd
onto a file which might otherwise be deleted sooner.

If it does, then please use libxl__carefd_opened.  (Best would be to
libxl__carefd_opened to save and restore errno and then you can call
it unconditionally after the open.)


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.