[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] libxl refactoring, call for discussion

Hi Ian,

I'll try to make clear domain restart thing first. Currently there are only device entries in domain config for frontends with backend paths in them, e.g.:

vdevice = [ 'backend=BackendDomain,devid=0,device=/dev/device' ]

it is parsed only on frontend domain start, and libxl creates both frontend and backend xenstore branches from that entry. Problem is, on backend domain restart libxl does not re-read this config, because it belongs to the frontend domain which is clearly not restarted, and as a consequence xenstore backend branches are cleared (on backend shutdown) but not filled again, because backend domain does not have any entries for this device.

What I propose is to add information needed to fill xenstore backend branches to backend domains and remove it from frontend domains, so it would possibly look like this:

frontend:vdevice = [ 'backend=BackendDomain,devid=0' ]

backend:vdevice = ['frontend=FrontendDomain,devid=0,device=/dev/device']

Now we have all the information needed for both domains to restart intependently filling their own xenstore branches. On FrontendDomain start libxl fills only frontend branch for vdevice, and on backendDomain startÂlibxlÂfills only backend branch.

This can be easily done in backwards-compatible way: if frontend entry is filled in the old format, it is used to fill both branches, but if we explicitly state that it's only frontend or only backend entry, then it is used to fill only the corresponding branch.

For the main part, that's it: currently the main thing we need is a DomD restart, and such a change would be enough for that purpose with no need for dirty hacks.

But from that point we can make one step further: to remove split driver connection information from user domains. It would possibly look like this:

frontend:vdevice = [ 'devid=0' ]

backend:vdevice = ['devid=0,devname=Main,device=/dev/device']

backend:vdevice = ['devid=0,devname=Fallback,device=/dev/device']

connection:vdevice = ['devid=0,Âfrontend=FrontendDomain,Âbackend=BackendDomain, devname=Main']
connection:vdevice = ['devid=0,Âfrontend=FrontendDomain,Âbackend=SupervisorDomain, devname=Fallback']

Motivation for this example is the following: if we have, e.g., PV GPU which is used in user domain and we have restrictions on graphics availability, we can monitor if main device backend hanged and make a runtime switch to a fallback device (software rendering) till main backend would be restated. Connections are static, they generally do not change and do not actually belong to any user domain, so there is no need clearing and refilling them on every domain restart.

This can be easily done in backwards-compatible way as well: if entries already have connection information, we stick to the old model; if they don't, we look for the connection branch to fill missing fields and to setup fallback paths.

There still is an issue with hoplug devices being non-persistent (xl vdevice-attach fills both frontend and backend branches, but restart of any of user domains looses this information). At the moment we fix this rather dirty with additional 'hotplugged' xenstore branch bot belonging to any domain so it's not erased on shutdown but is being read on domain start if exists. In context of reworking libxl, I believe this one can be done as well, though I still don't have a solution I would like.

Suikov Pavlo
P +x.xxx.xxx.xxxxÂÂM +38.066.667.1296Â S psujkov


On Wed, Jan 27, 2016 at 4:26 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote:
On Mon, 2016-01-25 at 15:38 +0200, Pavlo Suikov wrote:
> Hi everyone,
> I want to bring domain restart question for a discussion. It originates
> from DomD restart, but the solution I am about to offer can be quite
> generic.
> Problem is, domain specification currently holds only frontend info,
> which is used to generate both frontend and backend entries for a device;
> that means that backend xenstore data is handled not by a domain that
> owns backends, but by a domain that has frontends linked to these
> backends. As a result, reboot of any domain with backends without reboot
> of the corresponding frontend domains is impossible.
> This is wrong on many levels, but the main thing is: some domains don't
> know something they should (do they have any backends) and some know
> something they should not (there are backends for their frontends in
> different domains).
> I propose following change: make domain "require" and "provide"
> interfaces visible (think CORBA), hold connections between the two in the
> priveleged domain (where toolstack is, think controller from the MVC
> idiom). With this change domains (except for Dom0 which is a special
> case) can be rebooted in any order whatsoever, and frontend/backend link
> can be adjusted as a static config or during runtime (e.g. if hardware
> rendering backend hangs, switch to software rendering to avoid glitches).
> However, it requires change in the libxl internal device representation
> (device should not be a frontend/backend pair any more) and config format
> change, which breaks backwards compatibility.
> That is, I want domain configuration hold records on both frontends (what
> this domain require) and backends (what this domain provides) and libxl
> to create corresponding xenstore branches separately. Moreover, I'd like
> to have frontend/backend connection information be held in a different
> config belonging to Dom0, so that on any domain reboot (or any
> exceptional situation like watchdog failure) supervisor (Dom0) can use
> this information to initiate a reconnect.
> And, as we talk about libxl refactoring, I'd like to state one point
> more: code duplication. Libxl support for a split-driver model consists
> of an declarative IDL device specification, xenstore read, xenstore
> write, config read, config write, xl args read, JSON read/write and
> device chain of responsibility with async device creation. The only thing
> IDL is used for is type and JSON read/write code generation, everything
> else is an error-prone hand-written duplicated code.
> Why won't we generate as much as we can? That means generation of
> xenstore read, xenstore write, config read, config write and xl args read
> - these all directly depend on device IDL specification. If we already
> have external code generation tool, why not use it to full extent instead
> of writing all this serialization/deserialization code manually (and in
> different styles - e.g. block device is the only one that uses lexx, and
> xl_cmdimpl.c parse_config_data implementations differs from device to
> device quite a lot)?

It seems that much of this is conflating several separate things:

 Â1. The contents of xenstore representing a front/back pair, and the
   protocol which they imply.
 Â2. The code in libxl which populates that state
 Â3. The data structures used inside libxl to describe such devices
 Â4. The code in xl which parses xl's own config file format (which is not
   common to all toolstacks)

Currently only #3 is covered by the IDL.

It's possible (likely even) that some or all of #1 and #2 might be
describable either in an extension to the IDL or in some other domain
specific language or that more of the setup in these cases could be made
common than is the case in the code today, and I would expect such patches
to be welcomed (although it might be wise to discuss a specific plan

Note that for historical reasons their are some subtle differences between
the handling required for some devices.

#4 isn't really subject to the libxl IDL, being xl code relating to xl's
own configuration file. It's possible that xl could benefit from some more
code automation. Wei is looking at cleaning some of the various adhoc
parsing of key-value lists for example (and has posted some patches in the
last week or so). There's probably more cleanup which could be done in this
area and again I would expect patches to be welcome.

It seems like aside from proposing to autogenerate more code you may also
proposing some sort of change to the actual xenstore protocol used by pv
front and backends (the "require"/"provide" concept from CORBA and the MVC
idiom bits). It's not clear to me exactly what the proposal is but just let
me say that (unfortunately) any changes to these arrangements need to be
made in a backward compatible way, we can't simply rework everything and
break all existing guests (or dom0's). Maybe I've misunderstood what you
meant to suggest though.

Note that the xl config file syntax is subject to evolution and can be
extended, or even changed, although we would of course prefer to minimise
breakage where possible (i.e. extending is preferred).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.