|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Issue policing writes from Xen to PV domain memory
>> The above sequence of events would not cause an EPT violation (I
>> realize this can happen only in the guest context) or pagefault (The
>> page is present and marked writable in the guest). All that would
>> happen is an event to be sent to the listener from __hvm_copy() and no
>cascading faults will occur.
>
>I have to admit that I'm unable to spot where such an event gets sent.
I had preceded this by mentioning "Now take the similar scenario in the
hypothetical HVM + EPT case where we are policing Xen writes to guest memory."
The code would look something like this if it was implemented. Please note this
was just a quick hacky write up :-)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index da220bf..8ce99d5 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2750,6 +2750,65 @@ static enum hvm_copy_result __hvm_copy(
return HVMCOPY_unhandleable;
}
+ if ( unlikely(mem_event_check_ring(&curr->domain->mem_event->access)) )
+ {
+ struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain);
+ mfn_t gmfn;
+ p2m_type_t _p2mt;
+ p2m_access_t p2ma;
+ bool_t violation = 0;
+ mem_event_request_t *req_ptr = NULL;
+
+ gmfn = p2m->get_entry(p2m, gfn, &_p2mt, &p2ma, 0, NULL);;
+
+ /* If the access is against the permissions, then send to
mem_event */
+ switch (p2ma)
+ {
+ case p2m_access_n:
+ case p2m_access_n2rwx:
+ case p2m_access_x:
+ violation = 1;
+ break;
+
+ case p2m_access_w:
+ case p2m_access_wx:
+ if ( flags & HVMCOPY_from_guest )
+ violation = 1;
+ break;
+
+ case p2m_access_r:
+ case p2m_access_rx:
+ case p2m_access_rx2rw:
+ if ( flags & HVMCOPY_to_guest )
+ violation = 1;
+ break;
+
+ case p2m_access_rw:
+ case p2m_access_rwx:
+ break;
+ }
+
+ if ( violation )
+ {
+ paddr_t gpa = (gfn << PAGE_SHIFT) +
+ (flags & HVMCOPY_virt ?
+ (addr & ((1 << PAGE_SHIFT) - 1)) : 0);
+
+ if ( !p2m_mem_access_check(gpa, (flags & HVMCOPY_virt ? 1 : 0),
+ addr, 0, 1, 0, &req_ptr) )
+ {
+ if ( unlikely(req_ptr != NULL) )
+ {
+ mem_access_send_req(curr->domain, req_ptr);
+ xfree(req_ptr);
+ }
+ put_page(page);
+ return HVMCOPY_unhandleable;
+ }
+ }
+ }
+
p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
if ( flags & HVMCOPY_to_guest )
>What is quite clear is that you don't want multiple nested runstate area
>updates to occur in the first place, i.e. you need to properly deal with the
>_first_ page fault occurring instead of waiting until multiple such events pile
>up. And I'm afraid doing so will require some (perhaps gross) hackery...
OK, I will take a stab at this.
>> Looking at what the solution for the ring being full in the PV case
>> whether we are policing Xen writes or not, calling wait() will not
>> work due to the scenario I had mentioned a while back and is shown above
>in the stack trace.
>> I am repeating that flow here
>> mem_event_claim_slot() ->
>> mem_event_wait_slot() ->
>> wait_event(mem_event_wait_try_grab(med, &rc) != -
>EBUSY)
>>
>> wait_event() macro looks like this:
>> do {
>> if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
>> break;
>> for ( ; ; ) {
>> prepare_to_wait(&med->wq);
>> if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
>> break;
>> wait();
>> }
>> finish_wait(&med->wq);
>> } while (0)
>>
>> In the case where the ring is full, wait() gets called and the cpu
>> gets scheduled away. But since it is in middle of a pagefault, when it
>> runs again it ends up in handle_exception_saved and the same pagefault is
>tried again.
>> But since finish_wait() never ends up being called wqv->esp never
>> becomes 0 and hence the assert fires on the next go around. So I think
>> we should be calling
>> process_pending_softirqs() instead of wait() for PV domains.
>
>That would effectively be a spin wait then, which is surely not the right
>thing.
>But I don't follow your explanation above anyway - when coming back from
>wait(), the state is the same as the original one, so the page fault handling
>continues, it's not being retried.
Let me step through this. The pagefault for runstate occurs and wait_event()
gets called and the ring is full.
wait_event():
do {
if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will
return EBUSY
break;
for ( ; ; ) {
prepare_to_wait(&med->wq);
if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will
return EBUSY
break;
wait();
==> Write to runstate area occurs...
}
finish_wait(&med->wq);
} while (0)
...now this write to runstate will again cause a pagefault and we will end up
in wait_event() again. The previous attempt would have called prepare_to_wait()
but finish_wait() was not called. finish_wait()->__finish_wait() is where
wqv->esp gets reset to NULL. So now:
wait_event():
do {
if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will
return EBUSY
break;
for ( ; ; ) {
prepare_to_wait(&med->wq);
==> Assertion 'wqv->esp == 0' fails
if ( mem_event_wait_try_grab(med, &rc) != -EBUSY )
break;
wait();
}
finish_wait(&med->wq);
} while (0)
Does this make sense?
Thanks,
Aravindh
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |