[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] VT-d: honor firmware-first mode in XSA-59 workaround code



Andrew Cooper wrote on 2014-05-22:
> On 22/05/14 11:33, Jan Beulich wrote:
>>>>> On 22.05.14 at 12:19, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> No.  We have not observed an issue from XSA-59.
>>> 
>>> The version of XenServer we had the issue with didn't contain any
>>> of the
>>> XSA-59 fixes at the point that the problem was observed.
>> Then what was yesterday's alert about then? I.e. do we have any
>> indication that the workaround as is may cause problems, and that
>> hence the (relatively involved) patch here is needed at all? And,
>> how are you intending to test this patch if you haven't even seen an
>> issue?
>> 
>> Jan
>> 
> 
> As part of finding the root cause of our issue, we identified that
> just as Dom0 must not play with AER in firmware first mode, Xen must not play 
> either.

I saw upstream Linux has the patch to handle this case in 2009.

commit 0584396157ad2d008e2cc76b4ed6254151183a25
Author: Matt Domsch <Matt_Domsch@xxxxxxxx>
Date:   Mon Nov 2 11:51:24 2009 -0600

    PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode

    Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated.  This
    correctly handles PCI-X bridges, PCIe root ports and endpoints, and
    prints debug messages when invalid/reserved types are found in the
    HEST.  PCI devices not in domain/segment 0 are not represented in
    HEST, thus will be ignored.

    Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
    to every PCIe root port for which BIOS reports it should, via ACPI
    _OSC.

    However, _OSC alone is insufficient for newer BIOSes.  Part of ACPI
    4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
    for OS and BIOS to handshake over which errors for which components
    each will handle.  One table in ACPI 4.0 is the Hardware Error Source
    Table (HEST), where BIOS can define that errors for certain PCIe
    devices (or all devices), should be handled by BIOS ("Firmware First
    mode"), rather than be handled by the OS.

    Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
    that it may manage such errors, log them to the System Event Log, and
    possibly take other actions.  The aer driver should honor this, and
    not attach itself to devices noted as such.

    Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
    registers when respecting Firmware First mode.  Platform firmware is
    expected to manage these, and if changes to them are allowed, it could
    break that firmware's behavior.

    The HEST parsing code may be replaced in the future by a more
    feature-rich implementation.  This patch provides the minimum needed
    to prevent breakage until that implementation is available.
> 
> I believe that we have XSA-59 affected hardware with both
> firmware-first and non-firmware-first HEST tables, so we should be

why non-firmware-first hardware also affected? It seems only firmware-first 
hardware is buggy.

> able to behaviourally test the patch.
> 
> ~Andrew


Best regards,
Yang



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.