[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Broadwell TLB Erratum



>>> On 28.10.16 at 11:55, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 27/10/16 19:26, osstest service owner wrote:
>> flight 101698 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/101698/ 
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-xtf-amd64-amd64-5       44 xtf/test-hvm64-xsa-186   fail REGR. vs. 
> 101673
> 
> --- Xen Test Framework ---
> Environment: HVM 64bit (Long mode 4 levels)
> XSA-186 PoC
> ******************************
> PANIC: Unhandled exception at 0008:fffffffffffffffa
> Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
> ******************************
> 
> This is an issue I have seen before, and I think it is TLB erratum in
> Broadwell processors. Within XenServer, it has now been observed on one
> SDP and two different Broadwell servers from different vendors.
> 
> The first CPU I saw it on was
> 
> CPU Vendor: Intel, Family 6 (0x6), Model 71 (0x47), Stepping 1 (raw
> 00040671)
> 
> Nobbling-1, which this test ran on is
> 
> CPU Vendor: Intel, Family 6 (0x6), Model 79 (0x4f), Stepping 1 (raw
> 000406f1)
> 
> 
> The code in question sets up the mapping, memcpy()'s an instruction stub
> into place, then calls the stub.
> 
> This pagefault is from the call, after the memcpy() has succeeded,
> therefore proving the mapping is present in the dTLB.
> 
> The issue reproduces ~1 in 200 times, but can reliably be found in a
> minute or two. Inserting an invlpg instruction immediately before the
> call appears to resolve the issue (i.e. the tests run for ~1 hour
> without observing the issue).
> 
> Architecturally however, this invlpg should have no effect.  I think
> there is some race condition propagating TLB records to the L1 iTLB if
> it is already present in the L1 dTLB.
> 
> At the first time I discovered this, I checked the NDA Specification
> Update for the processor, and didn't find any published errata which
> matched the symptoms.

So until you/we hear back from Intel (which as we all know can take
a while), could you insert an INVLPG in the test, to eliminate these
(supposedly spurious) failures?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.