[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Issue policing writes from Xen to PV domain memory
>> The above sequence of events would not cause an EPT violation (I >> realize this can happen only in the guest context) or pagefault (The >> page is present and marked writable in the guest). All that would >> happen is an event to be sent to the listener from __hvm_copy() and no >cascading faults will occur. > >I have to admit that I'm unable to spot where such an event gets sent. I had preceded this by mentioning "Now take the similar scenario in the hypothetical HVM + EPT case where we are policing Xen writes to guest memory." The code would look something like this if it was implemented. Please note this was just a quick hacky write up :-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index da220bf..8ce99d5 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2750,6 +2750,65 @@ static enum hvm_copy_result __hvm_copy( return HVMCOPY_unhandleable; } + if ( unlikely(mem_event_check_ring(&curr->domain->mem_event->access)) ) + { + struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain); + mfn_t gmfn; + p2m_type_t _p2mt; + p2m_access_t p2ma; + bool_t violation = 0; + mem_event_request_t *req_ptr = NULL; + + gmfn = p2m->get_entry(p2m, gfn, &_p2mt, &p2ma, 0, NULL);; + + /* If the access is against the permissions, then send to mem_event */ + switch (p2ma) + { + case p2m_access_n: + case p2m_access_n2rwx: + case p2m_access_x: + violation = 1; + break; + + case p2m_access_w: + case p2m_access_wx: + if ( flags & HVMCOPY_from_guest ) + violation = 1; + break; + + case p2m_access_r: + case p2m_access_rx: + case p2m_access_rx2rw: + if ( flags & HVMCOPY_to_guest ) + violation = 1; + break; + + case p2m_access_rw: + case p2m_access_rwx: + break; + } + + if ( violation ) + { + paddr_t gpa = (gfn << PAGE_SHIFT) + + (flags & HVMCOPY_virt ? + (addr & ((1 << PAGE_SHIFT) - 1)) : 0); + + if ( !p2m_mem_access_check(gpa, (flags & HVMCOPY_virt ? 1 : 0), + addr, 0, 1, 0, &req_ptr) ) + { + if ( unlikely(req_ptr != NULL) ) + { + mem_access_send_req(curr->domain, req_ptr); + xfree(req_ptr); + } + put_page(page); + return HVMCOPY_unhandleable; + } + } + } + p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK); if ( flags & HVMCOPY_to_guest ) >What is quite clear is that you don't want multiple nested runstate area >updates to occur in the first place, i.e. you need to properly deal with the >_first_ page fault occurring instead of waiting until multiple such events pile >up. And I'm afraid doing so will require some (perhaps gross) hackery... OK, I will take a stab at this. >> Looking at what the solution for the ring being full in the PV case >> whether we are policing Xen writes or not, calling wait() will not >> work due to the scenario I had mentioned a while back and is shown above >in the stack trace. >> I am repeating that flow here >> mem_event_claim_slot() -> >> mem_event_wait_slot() -> >> wait_event(mem_event_wait_try_grab(med, &rc) != - >EBUSY) >> >> wait_event() macro looks like this: >> do { >> if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) >> break; >> for ( ; ; ) { >> prepare_to_wait(&med->wq); >> if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) >> break; >> wait(); >> } >> finish_wait(&med->wq); >> } while (0) >> >> In the case where the ring is full, wait() gets called and the cpu >> gets scheduled away. But since it is in middle of a pagefault, when it >> runs again it ends up in handle_exception_saved and the same pagefault is >tried again. >> But since finish_wait() never ends up being called wqv->esp never >> becomes 0 and hence the assert fires on the next go around. So I think >> we should be calling >> process_pending_softirqs() instead of wait() for PV domains. > >That would effectively be a spin wait then, which is surely not the right >thing. >But I don't follow your explanation above anyway - when coming back from >wait(), the state is the same as the original one, so the page fault handling >continues, it's not being retried. Let me step through this. The pagefault for runstate occurs and wait_event() gets called and the ring is full. wait_event(): do { if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will return EBUSY break; for ( ; ; ) { prepare_to_wait(&med->wq); if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will return EBUSY break; wait(); ==> Write to runstate area occurs... } finish_wait(&med->wq); } while (0) ...now this write to runstate will again cause a pagefault and we will end up in wait_event() again. The previous attempt would have called prepare_to_wait() but finish_wait() was not called. finish_wait()->__finish_wait() is where wqv->esp gets reset to NULL. So now: wait_event(): do { if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will return EBUSY break; for ( ; ; ) { prepare_to_wait(&med->wq); ==> Assertion 'wqv->esp == 0' fails if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) break; wait(); } finish_wait(&med->wq); } while (0) Does this make sense? Thanks, Aravindh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |