[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Broadwell TLB Erratum
>>> On 28.10.16 at 11:55, <andrew.cooper3@xxxxxxxxxx> wrote: > On 27/10/16 19:26, osstest service owner wrote: >> flight 101698 xen-unstable real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/101698/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-xtf-amd64-amd64-5 44 xtf/test-hvm64-xsa-186 fail REGR. vs. > 101673 > > --- Xen Test Framework --- > Environment: HVM 64bit (Long mode 4 levels) > XSA-186 PoC > ****************************** > PANIC: Unhandled exception at 0008:fffffffffffffffa > Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa > ****************************** > > This is an issue I have seen before, and I think it is TLB erratum in > Broadwell processors. Within XenServer, it has now been observed on one > SDP and two different Broadwell servers from different vendors. > > The first CPU I saw it on was > > CPU Vendor: Intel, Family 6 (0x6), Model 71 (0x47), Stepping 1 (raw > 00040671) > > Nobbling-1, which this test ran on is > > CPU Vendor: Intel, Family 6 (0x6), Model 79 (0x4f), Stepping 1 (raw > 000406f1) > > > The code in question sets up the mapping, memcpy()'s an instruction stub > into place, then calls the stub. > > This pagefault is from the call, after the memcpy() has succeeded, > therefore proving the mapping is present in the dTLB. > > The issue reproduces ~1 in 200 times, but can reliably be found in a > minute or two. Inserting an invlpg instruction immediately before the > call appears to resolve the issue (i.e. the tests run for ~1 hour > without observing the issue). > > Architecturally however, this invlpg should have no effect. I think > there is some race condition propagating TLB records to the L1 iTLB if > it is already present in the L1 dTLB. > > At the first time I discovered this, I checked the NDA Specification > Update for the processor, and didn't find any published errata which > matched the symptoms. So until you/we hear back from Intel (which as we all know can take a while), could you insert an INVLPG in the test, to eliminate these (supposedly spurious) failures? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |