[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
>>> On 11.04.16 at 13:14, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote: > On 4/9/2016 6:28 AM, Jan Beulich wrote: >>>>> On 31.03.16 at 12:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote: >>> @@ -168,13 +226,72 @@ static int hvmemul_do_io( >>> break; >>> case X86EMUL_UNHANDLEABLE: >>> { >>> - struct hvm_ioreq_server *s = >>> - hvm_select_ioreq_server(curr->domain, &p); >>> + struct hvm_ioreq_server *s; >>> + p2m_type_t p2mt; >>> + >>> + if ( is_mmio ) >>> + { >>> + unsigned long gmfn = paddr_to_pfn(addr); >>> + >>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >>> + >>> + switch ( p2mt ) >>> + { >>> + case p2m_ioreq_server: >>> + { >>> + unsigned long flags; >>> + >>> + p2m_get_ioreq_server(currd, &flags, &s); >> >> As the function apparently returns no value right now, please avoid >> the indirection on both values you're after - one of the two >> (presumably s) can be the function's return value. > > Well, current implementation of p2m_get_ioreq_server() has spin_lock/ > spin_unlock surrounding the reading of flags and the s, but I believe > we can also use the s as return value. The use of a lock inside the function has nothing to do with how it returns values to the caller. >>> /* If there is no suitable backing DM, just ignore accesses */ >>> if ( !s ) >>> { >>> - rc = hvm_process_io_intercept(&null_handler, &p); >>> + switch ( p2mt ) >>> + { >>> + case p2m_ioreq_server: >>> + /* >>> + * Race conditions may exist when access to a gfn with >>> + * p2m_ioreq_server is intercepted by hypervisor, during >>> + * which time p2m type of this gfn is recalculated back >>> + * to p2m_ram_rw. mem_handler is used to handle this >>> + * corner case. >>> + */ >> >> Now if there is such a race condition, the race could also be with a >> page changing first to ram_rw and then immediately further to e.g. >> ram_ro. See the earlier comment about assuming the page to be >> writable. >> > > Thanks, Jan. After rechecking the code, I suppose the race condition > will not happen. In hvmemul_do_io(), get_gfn_query_unlocked() is used > to peek the p2mt for the gfn, but get_gfn_type_access() is called inside > hvm_hap_nested_page_fault(), and this will guarantee no p2m change shall > occur during the emulation. > Is this understanding correct? Ah, yes, I think so. So the comment is misleading. >>> +static int hvm_map_mem_type_to_ioreq_server(struct domain *d, >>> + ioservid_t id, >>> + hvmmem_type_t type, >>> + uint32_t flags) >>> +{ >>> + struct hvm_ioreq_server *s; >>> + int rc; >>> + >>> + /* For now, only HVMMEM_ioreq_server is supported */ >>> + if ( type != HVMMEM_ioreq_server ) >>> + return -EINVAL; >>> + >>> + if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ | >>> + HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >>> + return -EINVAL; >>> + >>> + spin_lock(&d->arch.hvm_domain.ioreq_server.lock); >>> + >>> + rc = -ENOENT; >>> + list_for_each_entry ( s, >>> + &d->arch.hvm_domain.ioreq_server.list, >>> + list_entry ) >>> + { >>> + if ( s == d->arch.hvm_domain.default_ioreq_server ) >>> + continue; >>> + >>> + if ( s->id == id ) >>> + { >>> + rc = p2m_set_ioreq_server(d, flags, s); >>> + if ( rc == 0 ) >>> + gdprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n", >>> + s->id, (flags != 0) ? "mapped to" : "unmapped >>> from"); >> >> Why gdprintk()? I don't think the current domain is of much >> interest here. What would be of interest is the subject domain. >> > > s->id is not the domain_id, but id of the ioreq server. That's understood. But gdprintk() itself logs the current domain, which isn't as useful as the subject one. >>> --- a/xen/arch/x86/mm/p2m-ept.c >>> +++ b/xen/arch/x86/mm/p2m-ept.c >>> @@ -132,6 +132,19 @@ static void ept_p2m_type_to_flags(struct p2m_domain >>> *p2m, ept_entry_t *entry, >>> entry->r = entry->w = entry->x = 1; >>> entry->a = entry->d = !!cpu_has_vmx_ept_ad; >>> break; >>> + case p2m_ioreq_server: >>> + entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS); >>> + /* >>> + * write access right is disabled when entry->r is 0, but whether >>> + * write accesses are emulated by hypervisor or forwarded to an >>> + * ioreq server depends on the setting of p2m->ioreq.flags. >>> + */ >>> + entry->w = (entry->r && >>> + !(p2m->ioreq.flags & >>> P2M_IOREQ_HANDLE_WRITE_ACCESS)); >>> + entry->x = entry->r; >> >> Why would we want to allow instruction execution from such pages? >> And with all three bits now possibly being clear, aren't we risking the >> entries to be mis-treated as not-present ones? >> > > Hah. You got me. Thanks! :) > Now I realized it would be difficult if we wanna to emulate the read > operations for HVM. According to Intel mannual, entry->r is to be > cleared, so should entry->w if we do not want ept misconfig. And > with both read and write permissions being forbidden, entry->x can be > set only on processors with EXECUTE_ONLY capability. > To avoid any entry to be mis-treated as not-present. We have several > solutions: > a> do not support the read emulation for now - we have no such usage > case; > b> add the check of p2m_t against p2m_ioreq_server in is_epte_present - > a bit weird to me. > Which one do you prefer? or any other suggestions? That question would also need to be asked to others who had suggested supporting both. I'd be fine with a, but I also don't view b as too awkward. >>> + /* >>> + * Each time we map/unmap an ioreq server to/from p2m_ioreq_server, >>> + * we mark the p2m table to be recalculated, so that gfns which were >>> + * previously marked with p2m_ioreq_server can be resynced. >>> + */ >>> + p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); >> >> What does "resynced" here mean? I.e. I can see why this is wanted >> when unmapping a server, but when mapping a server there shouldn't >> be any such pages in the first place. >> > > There shouldn't be. But if there is(misbehavior from the device model > side), it can be recalculated back to p2m_ram_rw(which is not quite > necessary as the unmapping case). DM misbehavior should not result in such a problem - the hypervisor should refuse any bad requests. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |