Xen project Mailing List

Re: [Xen-devel] xc_hvm_inject_trap() races

To: Jan Beulich <jbeulich@xxxxxxxx>, "rcojocaru@xxxxxxxxxxxxxxx" <rcojocaru@xxxxxxxxxxxxxxx>

From: Andrei Vlad LUTAS <vlutas@xxxxxxxxxxxxxxx>

Date: Tue, 1 Nov 2016 16:13:46 +0000

Accept-language: en-US

Cc: "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "tamas@xxxxxxxxxxxxx" <tamas@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 01 Nov 2016 16:13:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHSNB9fAQsU0GFW/EGObuaHm+fPzKDDzBiAgAAGlACAAFPJgIAAIuIw

Thread-topic: [Xen-devel] xc_hvm_inject_trap() races

Hello, First of all, to answer your original question: the injection decision is made when the introspection logic needs to inspect a page that is not present in the physical memory. We don't really care if the current instruction triggers multiple faults or not (and here I'm not sure what you mean by that - multiple exceptions, or multiple EPT violations - but the answer is still the same), and removing the page restrictions after the #PF injection is introspection specific logic - the address for which we inject the #PF doesn't have to be related in any way to the current instruction. Assuming that we wouldn't remove the restrictions and we would rely on re-generating the event - that is not acceptable: first of all because the instruction would normally be emulated anyway before re-entering the guest, and secondly because that is not a normal CPU behavior (and it would break internal introspection logic) - once an instruction triggered an exit, it should be emulated or single-stepped. Best regards, Andrei. -----Original Message----- From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Jan Beulich Sent: 1 November, 2016 17:54 To: rcojocaru@xxxxxxxxxxxxxxx Cc: andrew.cooper3@xxxxxxxxxx; tamas@xxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx Subject: Re: [Xen-devel] xc_hvm_inject_trap() races >>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 11/01/16 11:53 AM >>> >On 11/01/2016 12:30 PM, Jan Beulich wrote: >>>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 11/01/16 10:04 AM >>> >>> We've stumbled across the following scenario: we're injecting a #PF >>> to try to bring a swapped page back, but Xen already have a pending >>> interrupt, and the two collide. >>> >>> I've logged what happens in hvm_do_resume() at the point of >>> injection, and stumbled across this: >>> >>> (XEN) [ 252.878389] vector: 14, type: 3, error_code: 0, >>> VM_ENTRY_INTR_INFO: 0x00000000800000e1 >>> >>> VM_ENTRY_INTR_INFO does have INTR_INFO_VALID_MASK set here. >> >> So a first question I have is this: What are the criteria that made >> your application decide it needs to inject a trap? Obviously there >> must have been some kind of event in the guest that triggered this. >> Hence the question is whether this same event wouldn't re-trigger at >> the end of the hardware interrupt (or could be made re-trigger reasonably >> easily). >> Because in the end what you're trying to do here is something that's >> architecturally impossible: There can't be a (non-nested) exception >> once an external interrupt has been accepted (i.e. any subsequent >> exception can only be related to the delivery of that interrupt >> vector, not to the code which was running when the interrupt was signaled). > >Unfortunately there are two main reasons why relying on the conditions >for injecting the page fault repeating is problematic: > >1. We'd need to be able differentiate between a failed run (where >injection doesn't work) and a succesful run, with no real possibility >to know the difference for sure. So we don't know how to adapt the >application's internal state based on some events: is the event the >"final" one, or just a duplicate? xc_hvm_inject_trap() does not tell us >(as indeed it cannot know) if the injection succeeded, and there's no >other way to know. > >2. More importantly (although working around 1. is far from trivial), >the event may not be repeatable. As an example, #PF injection can occur >as part of handling an EPT event, but during handling the event the >introspection engine can decide to lift the restrictions on said page >after injecting the #PF. So the application relied on the #PF being >delivered, and with the restrictions lifted from the page there will be >no further EPT events for that page, therefore the main condition for >triggering the #PF is lost forever. Isn't this a problem you also have under other circumstances, e.g. multiple faults occurring for a single instruction? Which gets us to the fact that you didn't answer at all the initial question I did raise. Apart from that I'm also not really understanding the model you describe: What good does injecting #PF alongside lifting restrictions? I'd normally expect just one of the two to occur to deal with whatever caused the original event. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel ________________________ This email was scanned by Bitdefender _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.