[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xc_hvm_inject_trap() races

To: "rcojocaru@xxxxxxxxxxxxxxx" <rcojocaru@xxxxxxxxxxxxxxx>, "Andrei Vlad LUTAS" <vlutas@xxxxxxxxxxxxxxx>
From: "Jan Beulich" <JBeulich@xxxxxxxx>
Date: Wed, 02 Nov 2016 02:49:48 -0600
Cc: "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "tamas@xxxxxxxxxxxxx" <tamas@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 02 Nov 2016 08:50:17 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 01.11.16 at 23:17, <vlutas@xxxxxxxxxxxxxxx> wrote:
> From: Jan Beulich [mailto:jbeulich@xxxxxxxx]
> Sent: 1 November, 2016 18:40
>>>> Andrei Vlad LUTAS <vlutas@xxxxxxxxxxxxxxx> 11/01/16 5:13 PM >>>
>>>First of all, to answer your original question: the injection decision
>>>is made when the introspection logic needs to inspect a page that is
>>>not present in the physical memory. We don't really care if the current
>>>instruction triggers multiple faults or not (and here I'm not sure what
>>>you mean by that - multiple exceptions, or multiple EPT violations -
>>>but the answer is still the same), and removing the page restrictions
>>>after the #PF injection is introspection specific logic - the address
>>>for which we inject the #PF doesn't have to be related in any way to the 
> current instruction.
> 
>>Ah, that's this no-architectural behavior again.
> 
> I don't think the HVI #PF injection internals or how the #PF is handled by 
> the OS are relevant here. We are using an existing API that seems to not work 
> quite correct under certain circumstances and we were curious if any of you 
> can shed some light in this regard, and maybe point us to the right direction 
> for cooking up a fix.
> 
>>What if the OS doesn't fully carry out the page-in, relying on the #PF to 
> retrigger once the insn for which it got reported has been restarted?
> 
> Can you be more specific?

Well, perhaps with the answer you gave further down that's not that
relevant anymore, but consider a #PF handler which handles just the
top most not-present page table level each time it gets invoked. I.e.
for a not-present L4 entry it would take 4 re-invocations of the same
original instruction to resolve all 4 levels.

>> Or what if the page gets paged out again before the insn actually gets to 
> execute (e.g. because a re-schedule happened inside the guest on the way out 
> of the #PF handler)? All of this suggests that you really can't lift >any 
> restrictions _before_ seeing what you need to see.
> 
> We don't really care when and how the #PF is handled. We don't care if the 
> page is paged out at some random point. What we do know is that at a certain 
> point in the future, the page will be swapped in; how do we know when? The OS 
> will write the guest page tables, at which point we can inspect the physical 
> page itself (so you can see here why we don't care about the page being 
> swapped out sometime in the future). So we really _can_ lift any restriction 
> we want at that point.

Hmm, I'm having difficulty seeing the supposedly broken flow of
events here: Earlier it was said that #PF injection would be a result
of EPT event processing. Here you say that the lifting of the
restrictions would be a result of seeing the guest modify its page
tables (which would in turn be a result of the #PF actually having
arrived in the guest). So if (with this, and as you say above) you
don't care when the #PF gets handled, where's the original problem?

>>>Assuming that we wouldn't remove the restrictions and we would rely on
>>>re-generating the event - that is not acceptable: first of all because
>>>the instruction would normally be emulated anyway before re-entering
>>>the guest,
> 
>>How would that be a problem?
> 
> I thought it was obvious without further clarification: how can we expect 
> the exact same event to be generated, if the instruction that triggered it in 
> the first place was emulated or single stepped?

Neither emulation nor single stepping should result in architectural
events (exceptions) to be missed (or else there's a bug somewhere).
Non-architectural #PF like you're using of course can't (currently) be
guaranteed to arrive at any particular point in time.

The fact that {vmx,svm}_inject_trap() combine the new exception
with an already injected one (and blindly discard events other than
hw exceptions), otoh, looks like indeed wants to be controllable by
the caller: When the event comes from the outside (the hypercall),
it would clearly seem better to simply tell the caller that no injection
happened and the event needs to be kept pending. The main
question then is how to make certain injection gets retried at the
right point in time (read: once the other interrupt handler IRETs
back to original context).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Andrei Vlad LUTAS
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Razvan Cojocaru

References:
- [Xen-devel] xc_hvm_inject_trap() races
  - From: Razvan Cojocaru
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Jan Beulich
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Razvan Cojocaru
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Jan Beulich
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Andrei Vlad LUTAS
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Jan Beulich
- Re: [Xen-devel] xc_hvm_inject_trap() races
  - From: Andrei Vlad LUTAS

Prev by Date: Re: [Xen-devel] xc_hvm_inject_trap() races
Next by Date: Re: [Xen-devel] [PATCH v2] build: make debug option affect tools only
Previous by thread: Re: [Xen-devel] xc_hvm_inject_trap() races
Next by thread: Re: [Xen-devel] xc_hvm_inject_trap() races
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.