Xen project Mailing List

Re: [Xen-devel] xc_hvm_inject_trap() races

From: "Jan Beulich" <jbeulich@xxxxxxxx>

Date: Tue, 01 Nov 2016 09:53:41 -0600

Cc: andrew.cooper3@xxxxxxxxxx, tamas@xxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 01 Nov 2016 15:54:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 11/01/16 11:53 AM >>> >On 11/01/2016 12:30 PM, Jan Beulich wrote: >>>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 11/01/16 10:04 AM >>> >>> We've stumbled across the following scenario: we're injecting a #PF to >>> try to bring a swapped page back, but Xen already have a pending >>> interrupt, and the two collide. >>> >>> I've logged what happens in hvm_do_resume() at the point of injection, >>> and stumbled across this: >>> >>> (XEN) [ 252.878389] vector: 14, type: 3, error_code: 0, >>> VM_ENTRY_INTR_INFO: 0x00000000800000e1 >>> >>> VM_ENTRY_INTR_INFO does have INTR_INFO_VALID_MASK set here. >> >> So a first question I have is this: What are the criteria that made your >> application decide it needs to inject a trap? Obviously there must have >> been some kind of event in the guest that triggered this. Hence the >> question is whether this same event wouldn't re-trigger at the end of the >> hardware interrupt (or could be made re-trigger reasonably easily). >> Because in the end what you're trying to do here is something that's >> architecturally impossible: There can't be a (non-nested) exception once >> an external interrupt has been accepted (i.e. any subsequent exception >> can only be related to the delivery of that interrupt vector, not to the code >> which was running when the interrupt was signaled). > >Unfortunately there are two main reasons why relying on the conditions >for injecting the page fault repeating is problematic: > >1. We'd need to be able differentiate between a failed run (where >injection doesn't work) and a succesful run, with no real possibility to >know the difference for sure. So we don't know how to adapt the >application's internal state based on some events: is the event the >"final" one, or just a duplicate? xc_hvm_inject_trap() does not tell us >(as indeed it cannot know) if the injection succeeded, and there's no >other way to know. > >2. More importantly (although working around 1. is far from trivial), >the event may not be repeatable. As an example, #PF injection can occur >as part of handling an EPT event, but during handling the event the >introspection engine can decide to lift the restrictions on said page >after injecting the #PF. So the application relied on the #PF being >delivered, and with the restrictions lifted from the page there will be >no further EPT events for that page, therefore the main condition for >triggering the #PF is lost forever. Isn't this a problem you also have under other circumstances, e.g. multiple faults occurring for a single instruction? Which gets us to the fact that you didn't answer at all the initial question I did raise. Apart from that I'm also not really understanding the model you describe: What good does injecting #PF alongside lifting restrictions? I'd normally expect just one of the two to occur to deal with whatever caused the original event. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.