[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [patch 0/7][PCIE-AER]Enable PCIE-AER support for XEN

Following 7 patches are for PCIE AER (Advanced Error Reporting) support for XEN.

Patches 1~4 back port from Linux Kernel which enables DOM0 kernel support to 

Those patches enable DOM0 PCIE error handling capability. When a device sends 
a PCIE error message to the root port, it will trigger an interrupt. The 
handler collects root error status register and schedule a DPC to deal with 
the error based on error type (correctable/non-fatal/fatal).

For correctable errors, it clears error status register of the device
For uncorrectable errors (fatal, non-fatal), it calls the callback functions 
of the endpoint's driver. For bridge, it broadcasts the error to the 
downstream ports. For dom0, it means pciback driver will be called accordingly.
(Fatal error needs to do some additional job such as reset pcie-link, etc.)

Patch 5~7: AER error handler implementation in pciback and pcifront. This the 
main job we have done

As mentioned above, pciback pci error handler will be scheduled by root port 
AER service. 
Pciback then asks pcifront help to call end-device driver's support, completing 
related pci error handling.
Please see detailed work flow/policy
Below workflow/policy illustration might be helpful:
1) Assign an AER-capable network device to a PV driver domain 
2) Installed network device driver in PV guest which support pci error handling.
3) If no device driver installed in PV guest, or the driver does not register 
pci error handler, the guest will be killed directly (the devices will be 
HVM guest will be directly killed currently
4) Trigger AER by test driver, an interrupt will be generated and caught by 
root port.
5) AER service driver below root port in DOM0 will help to do the recovery 
For each recovery process (error_detected, mmio_enabled, slot_reset, 
error_resume), aer core will cooperate with each below devices which registers
pci_error_handlers. For details, please see the related docs in kernel (patch1 
6) pciback_error_handler will then be called by AER core for each above four
steps. Pciback will send the service request to pcifront for each step. Pcifront
then tries to call the corresponding device driver if device driver has the 
If each recovery step succeeds, this pcie error should have been successfully 
recovered. Otherwise, impacted guest will be killed and the pcie device will be 
Test environment:
We have tested the patches on IPF Hitachi which could trigger Unsupported 
non-fatal AER by read/write a non-existing function on a pci-device which 
supports AER. (We need to make sure the whole path: end device, bridges and the 
root port must support AER too)
We also test it on the x86 and make sure it will not break current code path.
Any question, just let me know.
Thanks a lot for your help!

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.