[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] libxl: Documentation about the domain configuration on disk



Wei Liu writes ("Re: [PATCH] libxl: Documentation about the domain 
configuration on disk"):
> On Thu, Dec 06, 2018 at 02:57:33PM +0000, Anthony PERARD wrote:
> > Anyway, that comment block isn't very helpful because it basically says
> > that we can't depriv QEMU, I mean do hotplug with a deprived QEMU. It
> > assumes that we can keep a lock on the userdata while updating the
> > guest, but we can't keep the lock while talking with QEMU (or more
> > generaly: we can't keep the lock while doing any async operation).
> > 
> > But there is one useful piece of information:
> >     Here we maintain one invariant: every device in xenstore must have
> >     an entry in JSON file.
> > (xenstore is describe as "primary reference" just before that sentence).
> 
> Yes. That.
...
> When removing a CD, you only need to update the primary source -- QEMU
> in this case, you can leave libxl-json untouched. It is allowed to have
> stale entries in libxl-json. This is implied in "We may not even need
> this ..." further above.

If you leave a state entry in libxl-json then you would reject
attempts to insert a new cd, because the libxl-json would tell you one
is already present.

I think the lock difficulty Anthony identifies is real.  We need to
either develop a new set of locking rules, or make all acquisitions of
the libxl-json lock slow.


The invariant that we want to maintain is:

  * Nothing may exist in the primary config without
    a corresponding entry in libxl-json.

The rules that implement that are:

  * No-one may edit the libxl-json without holding the lock.

  * You may not cause a thing to be added to the primary config
    unless you have held the libxl-json lock continuously
    since ensuring that the libxl-json config describes it.

  * Conversely you may not cause a thing to be removed from
    the libxl-json unless you have held the libxl-json lock
    continueously since ensuring the thing is absent
    from the primary config.

And unfortunately much code acquiring the libxl-json lock expects it
to be fast.


How about the following scheme.  We split the libxl-json lock into
two.  I'm going to call them the fast lock and the slow lock.

  * The fast lock is the existing libxl-json lock.

  * The slow lock is outside the libxl-json lock in the lock
    hierarchy.  It is also outside the libxl_ctx lock.  It is
    to be acquired by an ao event callback.

  * No-one may read or edit the libxl-json without holding the fast
    lock across their read operation, or their read/modify/write
    cycle.

  * However, there are special rules for thing removal/addition, for
    things added/removed via qmp.  Call these `qmp things'.  It is
    permissible to add or remove a qmp thing across two separate
    acquisitions of the fast lock, one to read the old state of the
    thing, and one to read/modify/write to update (only) the new state
    of the thing.  This is subject to the thing add/removal rule, from
    before, which becomes:

  * You may not cause a thing to be added to the primary config
    unless you have held the relevant thing lock continuously
    since ensuring that the libxl-json config describes it.

  * Conversely you may not cause a thing to be removed from the
    libxl-json unless you have held the relevant thing lock
    continuously since ensuring the thing is absent from the primary
    config.

  * The `relevant thing lock' is the slow lock for qmp things, and the
    fast lock for other things.

  * Acquiring the fast lock fails for a destroyed domain, as at
    present.

I think this maintains the invariant.

I haven't figured out domain destruction.  Ideally domain destruction
could happen without taking the slow lock.

I think this is probably possible if we make sure that qemu is always
killed before the libxl-json is removed.  The result is that if a qmp
thing operation races with domain destruction, and in its 1st read
gets an existing libxl-json from before destruction, the qmp thing
will, when it acquires the fast lock again, necessarily the libxl-json
will not exist, and the qmp operation will bomb out.

But I don't exactly know how this relates to domain creation.  In
general I haven't thought about races between domain creation and
domain destruction.  We don't want a situation like this:

  1 domain destruction tries to kill qemu (but it doesn't exist yet)
  2 domain creation creates libxl-json and qemu
  3 domain creation complets
  4 qmp add reads libxl-json, sees thing absent
  5 qmp add sends qmp command
  6 qmp add gets qmp response and updates libxl-json
  7 domain destruction deletes the libxl-json
  8 domain destruction crashes before it destroys domain in Xen

Maybe qemu's existence is `primary non-qmp state' and in fact domain
destruction is not allowed to destroy it without holding the
libxl-json lock.  But I bet that rule is not honoured right now.

Comments, anyone ?


> Anyway, I'm not overly opposed to adding some easy to grep pointers, but
> CODING_STYLE looks wrong to me.  Maybe README.dev?

I'm sorry I didn't make that comment easy to find.  I think maybe
adding cross-reference comments (perhaps several, in the places you
looked) would be a good idea.

The problem with  a file like HACKING or CODING_STYLE  is that it will
become a large list of things  unconnected to what they are about, and
therefore once again hard to find.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.