[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xc_hvm_inject_trap() races



Hello,

First of all, to answer your original question: the injection decision is made 
when the introspection logic needs to inspect a page that is not present in the 
physical memory. We don't really care if the current instruction triggers 
multiple faults or not (and here I'm not sure what you mean by that - multiple 
exceptions, or multiple EPT violations - but the answer is still the same), and 
removing the page restrictions after the #PF injection is introspection 
specific logic - the address for which we inject the #PF doesn't have to be 
related in any way to the current instruction. Assuming that we wouldn't remove 
the restrictions and we would rely on re-generating the event - that is not 
acceptable: first of all because the instruction would normally be emulated 
anyway before re-entering the guest, and secondly because that is not a normal 
CPU behavior (and it would break internal introspection logic) - once an 
instruction triggered an exit, it should be emulated or single-stepped.

Best regards,
Andrei.
 
-----Original Message-----
From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Jan 
Beulich
Sent: 1 November, 2016 17:54
To: rcojocaru@xxxxxxxxxxxxxxx
Cc: andrew.cooper3@xxxxxxxxxx; tamas@xxxxxxxxxxxxx; 
xen-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] xc_hvm_inject_trap() races

>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 11/01/16 11:53 AM >>>
>On 11/01/2016 12:30 PM, Jan Beulich wrote:
>>>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 11/01/16 10:04 AM >>>
>>> We've stumbled across the following scenario: we're injecting a #PF 
>>> to try to bring a swapped page back, but Xen already have a pending 
>>> interrupt, and the two collide.
>>>
>>> I've logged what happens in hvm_do_resume() at the point of 
>>> injection, and stumbled across this:
>>>
>>> (XEN) [  252.878389] vector: 14, type: 3, error_code: 0,
>>> VM_ENTRY_INTR_INFO: 0x00000000800000e1
>>>
>>> VM_ENTRY_INTR_INFO does have INTR_INFO_VALID_MASK set here.
>> 
>> So a first question I have is this: What are the criteria that made 
>> your application decide it needs to inject a trap? Obviously there 
>> must have been some kind of event in the guest that triggered this. 
>> Hence the question is whether this same event wouldn't re-trigger at 
>> the end of the hardware interrupt (or could be made re-trigger reasonably 
>> easily).
>> Because in the end what you're trying to do here is something that's 
>> architecturally impossible: There can't be a (non-nested) exception 
>> once an external interrupt has been accepted (i.e. any subsequent 
>> exception can only be related to the delivery of that interrupt 
>> vector, not to the code which was running when the interrupt was signaled).
>
>Unfortunately there are two main reasons why relying on the conditions 
>for injecting the page fault repeating is problematic:
>
>1. We'd need to be able differentiate between a failed run (where 
>injection doesn't work) and a succesful run, with no real possibility 
>to know the difference for sure. So we don't know how to adapt the 
>application's internal state based on some events: is the event the 
>"final" one, or just a duplicate? xc_hvm_inject_trap() does not tell us 
>(as indeed it cannot know) if the injection succeeded, and there's no 
>other way to know.
>
>2. More importantly (although working around 1. is far from trivial), 
>the event may not be repeatable. As an example, #PF injection can occur 
>as part of handling an EPT event, but during handling the event the 
>introspection engine can decide to lift the restrictions on said page 
>after injecting the #PF. So the application relied on the #PF being 
>delivered, and with the restrictions lifted from the page there will be 
>no further EPT events for that page, therefore the main condition for 
>triggering the #PF is lost forever.

Isn't this a problem you also have under other circumstances, e.g.
multiple faults occurring for a single instruction? Which gets us to the fact 
that you didn't answer at all the initial question I did raise. Apart from that 
I'm also not really understanding the model you describe:
What good does injecting #PF alongside lifting restrictions? I'd normally 
expect just one of the two to occur to deal with whatever caused the original 
event.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

________________________
This email was scanned by Bitdefender
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.