[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7] x86/emulate: Send vm_event from emulate



On 19.07.2019 14:34, Alexandru Stefan ISAILA wrote:
> On 18.07.2019 15:58, Jan Beulich wrote:
>> On 03.07.2019 12:56, Alexandru Stefan ISAILA wrote:
>>> A/D bit writes (on page walks) can be considered benign by an introspection
>>> agent, so receiving vm_events for them is a pessimization. We try here to
>>> optimize by fitering these events out.
>>
>> But you add the sending of more events - how does "filter out" match
>> the actual implementation?
> 
> The events are send only if there is a mem access violation therefore we
> are filtering and only sending the events that are interesting to
> introspection.

Where is it that you prevent any event from being sent? As said,
reading the patch I only see new sending sites to get added.

>>> Currently, we are fully emulating the instruction at RIP when the hardware 
>>> sees
>>> an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
>>> incorrect, because the instruction at RIP might legitimately cause an
>>> EPT fault of its own while accessing a _different_ page from the original 
>>> one,
>>> where A/D were set.
>>> The solution is to perform the whole emulation,
>>
>> Above you said fully emulating such an insn is incorrect. To me the
>> two statements contradict one another.
>>
>>> while ignoring EPT restrictions
>>> for the walk part, and taking them into account for the "actual" emulating 
>>> of
>>> the instruction at RIP.
>>
>> So the "ignore" part here is because the walk doesn't currently send
>> any events? That's an omission after all, which ultimately wants to
>> get fixed. This in turn makes me wonder whether there couldn't be
>> cases where a monitor actually wants to see these violations, too.
>> After all one may be able to abuse to page walker to set bits in
>> places you actually care to protect from undue modification.
> 
> There is no need for events from page walk. Further work will have to be
> done, when page-walk will send events, so that we can toggle that new
> feature on/off.

Please can you move over to thinking in more general terms,
not just what you need for your application. In this case
"There is no need" != "We don't have a need for". And I think
the VM event _interface_ should be arranged in a way that it
already accounts for eventually correct behavior of the page
walk paths.

>>> After the emulation stops we'll call hvm_vm_event_do_resume() again after 
>>> the
>>> introspection agent treats the event and resumes the guest. There, the
>>> instruction at RIP will be fully emulated (with the EPT ignored) if the
>>> introspection application allows it, and the guest will continue to run past
>>> the instruction.
>>>
>>> We use hvmemul_map_linear_addr() to intercept r/w access and
>>> __hvm_copy() to intercept exec access.
>>
>> Btw I continue to be unhappy about this asymmetry. Furthermore in
>> the former case you only handle write and rmw accesses, but not
>> reads afaics. I assume you don't care about reads, but this should
>> then be made explicit. Furthermore EPT allows read protection, and
>> there are p2m_access_w, p2m_access_wx, and p2m_access_x, so I guess
>> ignoring reads can at best be an option picked by the monitor, not
>> something to be left out of the interface altogether.
> 
> That is correct, we are not interested in read events but there is
> another problem, we are checking access and pfec to fill the event flag
> and pfec only has a write flag(PFEC_write_access), in __hvmemul_read()
> pfec only gets PFEC_page_present and there is no way to differentiate
> write from read.

By the PFEC model, anything that's not a write or insn fetch is a
read. The main anomaly is elsewhere: The write flag is also going
to be set for RMW operations.

>>> hvm_emulate_send_vm_event() can return false if there was no violation,
>>> if there was an error from monitor_traps() or p2m_get_mem_access().
>>
>> As said before - I don't think errors and lack of a violation can
>> sensibly be treated the same way. Is the implication perhaps that
>> emulation then will fail later anyway? If so, is such an
>> assumption taking into consideration possible races?
> 
> The only place that I can see a problem is the error from
> monitor_traps(). That can be checked and accompanied by a warning msg.

How would a warning message help?

> Or if you can give me a different idea to go forward with this issue I
> will be glad to review it.

I'm afraid you'll have to first of all give me an idea what the
correct action is in case of such an error. And once you've done
so, I'm pretty sure you'll recognize yourself whether the current
code you have is appropriate (and I'll then know whether I want
to insist on you changing the code).

>>> @@ -629,6 +697,14 @@ static void *hvmemul_map_linear_addr(
>>>     
>>>                 ASSERT(p2mt == p2m_ram_logdirty || !p2m_is_readonly(p2mt));
>>>             }
>>> +
>>> +        if ( curr->arch.vm_event &&
>>> +            curr->arch.vm_event->send_event &&
>>> +            hvm_emulate_send_vm_event(addr, gfn, pfec) )
>>> +        {
>>> +            err = ERR_PTR(~X86EMUL_RETRY);
>>> +            goto out;
>>> +        }
>>
>> Did you notice that there's an immediate exit from the loop only
>> in case the linear -> physical translation fails? This is
>> relevant for page fault delivery correctness for accesses
>> crossing page boundaries. I think you want to use
>> update_map_err() and drop the "goto out". I can't really make up
> 
> By update_map_err() are you saying to have the err var assigned and then
> drop "goto out"? If so how do I keep the err from my access violation
> without exiting from the loop?

Counter question: Why do you _need_ to keep "your" value of err?
If, just as an example, there's going to be a #PF on the other
half of the access, then "your" access violation is of no interest
at all.

>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -3224,6 +3224,14 @@ static enum hvm_translation_result __hvm_copy(
>>>                 return HVMTRANS_bad_gfn_to_mfn;
>>>             }
>>>     
>>> +        if ( unlikely(v->arch.vm_event) &&
>>> +            v->arch.vm_event->send_event &&
>>> +            hvm_emulate_send_vm_event(addr, gfn, pfec) )
>>
>> Indentation looks wrong again.
>>
>>> +        {
>>> +            put_page(page);
>>> +            return HVMTRANS_gfn_paged_out;
>>
>> Why "paged out"? If this is an intentional abuse, then you want
>> to say so in a comment and justify the abuse here or in the
>> description.
> 
> This is done to clean the paged in before the return.

What is "the paged in" here?

>>> --- a/xen/arch/x86/hvm/vm_event.c
>>> +++ b/xen/arch/x86/hvm/vm_event.c
>>> @@ -86,6 +86,7 @@ void hvm_vm_event_do_resume(struct vcpu *v)
>>>                       VM_EVENT_FLAG_SET_EMUL_INSN_DATA )
>>>                 kind = EMUL_KIND_SET_CONTEXT_INSN;
>>>     
>>> +        v->arch.vm_event->send_event = false;
>>>             hvm_emulate_one_vm_event(kind, TRAP_invalid_op,
>>>                                      X86_EVENT_NO_EC);
>>
>> Is this insertion meaning to use "true" instead, or is the
>> revision log entry wrong? Or does "set" there not necessarily
>> mean "set it to true", but just "set it to a deterministic
>> value" (in which case "initialize" would have been an
>> unambiguous alternative wording)?
> 
> This means to use "true" and send vm_event if there is a need to do so
> in the emulation.

??? (There's very clearly "false" being written above.)

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.