Xen project Mailing List

Re: Regressed XSA-286, was [xen-unstable test] 161917: regressions - FAIL

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Mon, 17 May 2021 10:43:40 +0200

Cc: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Ian Jackson <iwj@xxxxxxxxxxxxxx>

Delivery-date: Mon, 17 May 2021 08:43:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.05.2021 22:15, Andrew Cooper wrote: > On 13/05/2021 04:56, osstest service owner wrote: >> Tests which are failing intermittently (not blocking): >> test-xtf-amd64-amd64-3 92 xtf/test-pv32pae-xsa-286 fail in 161909 pass in >> 161917 > > While noticing the ARM issue above, I also spotted this one by chance. > There are two issues. > > First, I have reverted bed7e6cad30 and edcfce55917. The XTF test is > correct, and they really do reintroduce XSA-286. It is a miracle of > timing that we don't need an XSA/CVE against Xen 4.15. I have to admit that from the description in the revert (on top of what you say here) it does not really become clear to me what is wrong with _either_ of these changes: "The TLB flushing is for Xen's correctness, not the guest's." XSA-286 was solely about guest correctness, which was broken by Xen's behavior. Hence we're still only talking about guest observable behavior here. "The text in c/s bed7e6cad30 is technically correct, from the guests point of view, but clearly false as far as XSA-286 is concerned." As a result I also don't understand this, nor the actual reason why you did revert both, rather than just ... "That said, it is edcfce55917 which introduced the regression, which demonstrates that the reasoning is flawed." ... this. Furthermore you merely state an observation here, without going into any detail as to what's wrong with the reasoning, and hence why it is the change that's wrong and the test that's correct (and no issue elsewhere). Don't get me wrong - I'm not excluding you're right, but you fail to explain things properly. I can't see how avoiding a flush for a page table which isn't hooked up anywhere (and which hence isn't accessible via lookups through the linear page tables) can have caused a problem (except perhaps uncover an issue, e.g. a missing flush, elsewhere). Nor can I see how the XTF test would trigger the flush avoidance, as it doesn't play with free floating page tables. Plus this change affects 64-bit guests as much as 32-bit ones, yet no (apparent) regression could be seen there. Similarly for the other change: Since only guest perspective matters, the flush ought to be fine to defer until the guest actually reloads CR3; until then using either the stale or updated linear page tables is acceptable, and guests need to not rely on either, just like would be the case on bare metal (and there it's even stronger: an OS can rely upon the prior page tables to continue to be used, as the PDPTEs get reloaded _only_ during CR3 loads; mimicking this for PV would be not exactly trivial, I think). And I notice that the XTF test exercises an L3 entry update without a subsequent CR3 write, which is wrong for PAE. (I therefore suspect it is bed7e6cad30 which has caused the test failure, not edcfce55917 as you have said in the description of the revert.) > Given that I was unhappy with the changes in the first place, I don't > particularly want to see an attempt to resurrect them. I did not find > the claim that they were a perf improvement in the first place very > convincing, and the XTF test demonstrates that the reasoning about their > safety was incorrect. Interesting: Where did you voice your unhappiness? All I can find on that entire series' thread is a reply of yours on a post-commit- message remark regarding a comment you had introduced with the 286 fix. All other discussion there was between Roger and me. Additionally I don't see why you treated this as an emergency and reverted without posting a patch and getting an ack. > Second, the unexplained OSSTest behaviour. > > When I repro'd this on pinot1, test-pv32pae-xsa-286 failing was totally > deterministic and repeatable (I tried 100 times because the test is a > fraction of a second). > > From the log trawling which Ian already did, the first recorded failure > was flight 160912 on April 11th. All failures (12, but this number is a > few flights old now) were on pinot*. > > What would be interesting to see is whether there have been any passes > on pinot since 160912. > > I can't see any reason why the test would be reliable for me, but > unreliable for OSSTest, so I'm wondering whether it is actually > reliable, and something is wrong with the stickiness heuristic. Isn't (un)reliability of this test, besides the sensitivity to IRQs and context switches, tied to hardware behavior, in particular TLB capacity and replacement policy? Aiui the test has xtf_success("Success: Probably not vulnerable to XSA-286\n"); for the combination of all of these reasons. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.