[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Containing unrecoverable AER errors...



On 2017-06-20 12:56:34 +0100, Wei Liu wrote:
> On Wed, Jun 07, 2017 at 02:24:32PM -0500, Venu Busireddy wrote:
> > 
> > Hi,
> > 
> > I am working on creating a patch to aid in containing the unrecoverable
> > AER errors generated by PCI devices assigned to guests in passthrough
> > mode.
> > 
> > The overall approach is as follows:
> > 
> > 1. Change the BIOS settings such that the AER error handling is delegated
> >    to the host.
> > 
> > 2. Change the xen_pciback driver to store the name (SBDF) of the erring
> >    device in xenstore.
> > 
> > 3. At the time of creating the guest, setup a watcher for such writes to
> >    the xenstore.
> > 
> > 4. When the watcher is kicked off due to errors, *shutdown* the guest and
> >    mark the erring device unassignable until administrative intervention.
> > 
> > I got all of this working, but I was advised that shutting down the
> > guest is not the correct approach, because the guest may or may not
> > respond to the shutdown. The suggestion was to destroy the guest.
> > 
> > I ran into a problem with that. libxl_domain_destroy() is not
> > callable from within libxl. I tried to create a new wrapper to call
> > libxl__domain_destroy(), but the callback function never gets called!
> > Not surprisingly, because the description in libxl/libxl_internal.h
> > about asynchronous operations does prohibit this!
> > 
> > What is the best way to kill/destroy a guest from within libxl? Could you
> > please advise? I am including the patches below for reference (please
> > ignore the few debug statements). The problem part is the function
> > aer_backend_watch_callback() in tools/libxl/libxl_pci.c.
> > 
> [...]
> > +
> > +/* Handler of events for device driver domains */
> > +int libxl_reg_aer_events_handler(libxl_ctx *ctx, uint32_t domid)
> > +{
> > +    int rc;
> > +    char *be_path;
> > +    GC_INIT(ctx);
> > +
> 
> You can probably create an AO here, stash it somewhere, and the use it
> in your callback to destroy the domain.
> 
> See also: libxl_device_events_handler

Thanks, Wei! This suggestion worked great. Implemented it, and I sent
the patches for review!

Venu


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.