[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register



On Mon, Jun 08, 2015 at 10:03:18AM +0100, Jan Beulich wrote:
> >>> On 08.06.15 at 10:09, <malcolm.crossley@xxxxxxxxxx> wrote:
> > On 08/06/15 08:42, Jan Beulich wrote:
> >> Not really. All we concluded so far is that _maybe_ the bridge, upon
> >> seeing the UR, generates a Master Abort, rendering the whole thing
> >> fatal. Otoh the respective root port also has
> >> - Received Master Abort set in its Secondary Status register (but
> >>   that's also already the case in the log that we have before the UR
> >>   occurs, i.e. that doesn't mean all that much),
> >> - Received System Error set in its Secondary Status register (and
> >>   after the UR the sibling endpoint [UR originating from 83:00.0,
> >>   sibling being 83:00.1] also shows Signaled System Error set).
> >> 
> > 
> > Disabling the Memory decode in the command register could also result in a 
> > completion timeout on the
> > root port issuing a transaction towards the PCI device in question. PCIE 
> > completion timeouts can be
> > escalated to Fatal AER errors which trigger system firmware to inject NMI's 
> > into the host.
> 
> And how does all that play with PC compatibility (where writes into
> no-where get dropped, and reads from no-where get all ones
> returned)? Remember - we#re talking about CPU side accesses
> here.
> 
> > Here is an example AER setup for a PCIE root port. You can see UnsupReq 
> > errors are masked and so do
> > not trigger errors. CmpltTO ( completion timeout) errors are not masked and 
> > the errors are treated
> > as Fatal because the corresponding bit in the Uncorrectable Severity 
> > register is set.
> > 
> > Capabilities: [148 v1] Advanced Error Reporting
> > UESta:      DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- 
> > MalfTLP- ECRC- UnsupReq- 
> > ACSViol-
> > UEMsk:      DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- 
> > MalfTLP- ECRC- 
> > UnsupReq+ ACSViol+
> > UESvrt:     DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ 
> > MalfTLP+ ECRC- 
> > UnsupReq- ACSViol-
> > CESta:      RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> > CEMsk:      RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
> > AERCap:     First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> > 
> > A root port completion timeout will also result in the master abort bit 
> > being set.
> > 
> > Typically system firmware clears the error in the AER registers after it's 
> > processed it. So the
> > operating system may not be able to determine what error triggered the NMI 
> > in the first place.
> 
> Right, but in the case at hand we have an ITP log available, which
> increases the hope that we see a reasonably complete picture.
> 
> >>> Do we can chalk this up to hardware bugs on a specific box?
> >> 
> >> I have to admit that I'm still very uncertain whether to consider all
> >> this correct behavior, a firmware flaw, or a hardware bug.
> > I believe the correct behaviour is happening but a PCIE completion timeout 
> > is occurring instead of a
> > unsupported request.
> 
> Might it be that with the supposedly correct device returning UR
> the root port reissues the request to the sibling device, which then
> fails it in a more dramatic way (albeit the sibling's Uncorrectable
> Error Status Register also has only Unsupported Request Error
> Status set)?
> 
> Jan

Isn't the sibling a function on the same device?
And is the request causing the UR a memory read?
If so doesn't this use address routing?
What does it mean that the request is "to the sibling device" then?
Does the sibling device have a BAR overlapping the address?

-- 
MST

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.