[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors

On 2017-09-21 18:12:54 +0100, Ian Jackson wrote:
> Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to 
> handle unrecoverable AER errors"):
> > On 2017-08-08 15:33:01 +0100, Wei Liu wrote:
> > > I think a bigger question is whether you agree with Ian's comments
> > > regarding API design and whether you have more questions?
> > 
> > Ian suggested that I document the use of the API (about the event loop),
> > and I believe I addressed it. I don't have any more questions. Just
> > waiting for Ian's "Ack", or more comments.
> I'm afraid that I still have reservations about the design questions.
> Evidently I didn't make my questions clear enough.
> The most important question that seems unanswered to me is this:
>   Why is this only sometimes the right thing to do ?  On what basis
>   might a user choose ?
> To which you answered:
>   This is not an "only sometimes" thing. User doesn't choose it. We always
>   want to watch for AER errors.
> But this leads to more fundamental questions.
> If this behaviour is always required, why do we have an API call to
> request it ?  It sounds like not calling this new function of yours is
> always a mistake.  Ie this function (which has an obscure name) is

Yes, this behavior is always required. I will remove the API calls,
and in their place, create wrapper functions that can be called from
different places in libxenlight as needed.

libxl_reg_aer_events_handler() will be created as a wrapper function
that hides the internal details of calling libxl__ev_xswatch_register().

libxl_unreg_aer_events_handler() will be created as a wrapper function
that hides the internal details of calling libxl__ev_xswatch_deregister().

> like "IAC DONT RANDOMY-LOSE" (see RFC748, from 1st April 1978)
> except that you are making DO RANDOMLY-LOSE the default (in violation
> of the RFC, should anyone talk to the server over telnet...)
> If you are inventing a new kind of monitoring process that must be run
> for all domains, that is a thing that libxl does not have right now.
> At least, it doesn't have it in this form.  (xl has the reboot
> monitor, and this is done differently in libvirt.)

I am not inventing a new kind of monitoring process. My understanding
is that each domain already has its own monitoring process.

If the toolstack used is xl, I am proposing to use the 'xl' command as
the monitoring process (whether daemonized or not).

If the toolstack used is libvirt/virsh, I am proposing to use the daemon
'libvirtd' as the monitoring process. That process always stays around.

> It was indeed a design principle of libxl that it should (at least,
> wherever possible) be possible to run a domain _without_ a monitoring
> process imposed by libxl.
> So: why is what this API call requests, not done automatically by
> pciback or by Xen ?

When you say "Xen", I am assuming you meant "libxenlight." If so, the
answer is Yes. I am proposing to register/unregister the event handler
in libxenlight, so that all toolstacks can make use of the monitoring.

> And: if you are inventing a new monitoring process that must be run
> for every domain, you should call this out much more explicitly as a
> fundamental design change.
> We will then have to think about more questions: should this process
> be run automatically by libxl, without special application request
> (like the way that libxl runs qemu) ?

I am not inventing a new monitoring process, and hence, this isn't

> If not, how do we ensure that exactly one of these processes is
> running for each guest ?

As I mentioned in the earlier paragraphs above, there will only be one
process per domain. Never more than one process per domain. When using
the xl toolstack, that one process per domain is the 'xl' command. When
using the libvirt/virsh toolstack, then there is only one process (the
'libvirtd' daemon) that is monitoring all the domains. Therefore, there
will never be more than process per domain.

> If your new design involves new behaviour in callers of libxl, do you
> intend to send patches for libvirt to enable it ?

The new design does not involve any new behavior in the callers of
libxl. Registration of the event handler happens transparently, without
any explicit request from the libxl callers.

> Looking at the code:
> You handle errors by logging and continuing.  Why is that correct ?

In the unlikely case of failure to register the event handler, we felt
that it is better to continue with the creation of the guest without
monitoring (which is what happens today). But if you would like to see the
creation of the guest fail, that is an easy change to accommodate. Would
you like to fail the creation of the guest?

> If we are to keep the current API for the client, it needs to have
> better doc comments.

I will be removing the API, and hence, no doc comments.

> Is the xenstore watch implementation vulnerable to unexpected paths
> appearing in watch events ?

The event handler looks for specific paths with specific formats to exist
in the xenstore. Unless an user with administrative privilege adds entries
to xenstore, such paths should not come into existence in xenstore.

> Why is the API not a never-completing ao ?  Or, why is it not an
> evreg ?

I will be removing the API, and hence, this isn't applicable.

> But the fundamental design questions need answering first.

In v5 of the patch, the design incorporates all of the above aspects. The
API is removed. Existing processes are used to monitor for the events. In
case of libvirtd where one process monitors all the domains, I am
maintaining a list of handlers within that one single 'ctx'. The new
design also takes care of the monitoring irrespective of which toolstack
is used to create the guest.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.