Xen project Mailing List

[Xen-devel] Re: [PATCH 13/13] Nested Virtualiztion: hap-on-hap

To: Christoph Egger <Christoph.Egger@xxxxxxx>

From: Tim Deegan <Tim.Deegan@xxxxxxxxxx>

Date: Fri, 10 Sep 2010 11:00:19 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>

Delivery-date: Fri, 10 Sep 2010 03:01:15 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

At 15:01 +0100 on 09 Sep (1284044463), Christoph Egger wrote: > > This comment is confusing to me. The nested guest gpa isn't a virtual > > address, and certainly not the same as an L1-guest VA. Can you reword > > it, maybe just to say what the call to nestedhvm_hap_nested_page_fault() > > is going to do? > > New wording is: > > /* The vcpu is in guest mdoe and the l1 guest > * uses hap. That means 'gpa' is in l2 guest > * physical adderss space. > * Fix the nested p2m or inject nested page fault > * into l1 guest if not fixable. The algorithm is > * the same as for shadow paging. > */ > > Is this fine with you ? Sure, that's better. > > > - gfn = paging_gva_to_gfn(curr, addr, &pfec); > > > + gfn = paging_p2m_ga_to_gfn(curr, p2m, mode, cr3, addr, > > > &pfec); > > > > Do other callers of paging_gva_to_gfn need to handle nested-npt mode as > > well? > > If the other callers can be reached when the vcpu is in guest mode, then yes. It looks like most of them can. > > If so, would it be better to update paging_gva_to_gfn() itself? > > This is definitely easier but the call to p2m_get_p2m() might become be very > expensive and should be cached by passing through per function argument. I'd rather it was cached somewhere in the per-vcpu state if p2m_get_p2m() is expensive, and not pass any more arguments. Callers of paging_gva_to_gfn() shouldn't need to know about the internals of the p2m code - they should just be able to pass a VA and get a (L1) gfn. > > > + p2m = vcpu_nestedhvm(v).nh_p2m; > > > + if (p2m == NULL) > > > + /* p2m has either been invalidated or not yet assigned. */ > > > + return; > > > + > > > + cpu_set(v->processor, p2m->p2m_dirty_cpumask); > > > > Is this arch-specific code? (and likewise the cpu_clear that follows) > > No. I don't see how. It's only used by NPT-on-NPT. IIRC the EPT-on-EPT code uses a soft-TLB model so it won't need this cpumask. But it's not a big deal. > > This function looks like it duplicates a lot of logic from an old > > version of the normal p2m next-level function. > > Yes, it is redundant. The point is that the *_write_p2m_entry() functions > are not nested virtualization aware. I am having troubles with understanding > the monitor p2m tables. > > > Would it be easy to merge them? > > Maybe, maybe not. I can answer the question when I understand the use of the > monitor table and how they work. OK. I'll wait for the next rev then. > > > +int > > > +nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t L2_gpa) > > > +{ > > > + int rv; > > > + paddr_t L1_gpa, L0_gpa; > > > + struct domain *d = v->domain; > > > + struct p2m_domain *p2m, *nested_p2m; > > > + > > > + p2m = p2m_get_hostp2m(d); /* L0 p2m */ > > > + nested_p2m = p2m_get_nestedp2m(v, vcpu_nestedhvm(v).nh_vm_hostcr3); > > > + > > > + /* walk the L1 P2M table, note we have to pass p2m > > > + * and not nested_p2m here or we fail the walk forever, > > > + * otherwise. */ > > > > Can you explain that more fully? It looks like you're walking the L0 > > table to translate an L2 gfn into an L1 gfn, which surely can't be right. > > The l1 guest hostcr3 is in l1 guest physical address space. To walk the > l1 guest's page table I need to translate it into host physical address space > by walking the l0 p2m table. > > > > + rv = nestedhap_walk_L1_p2m(v, p2m, L2_gpa, &L1_gpa); OK, I understand now, thanks. > > > +static int > > > +p2m_flush_locked(struct p2m_domain *p2m) > > > +{ > > > + struct page_info * (*alloc)(struct p2m_domain *); > > > + void (*free)(struct p2m_domain *, struct page_info *); > > > + > > > + alloc = p2m->alloc_page; > > > + free = p2m->free_page; > > > + > > > + if (p2m->cr3 == 0) > > > + /* Microoptimisation: p2m is already empty. > > > + * => about 0.3% speedup of overall system performance. > > > + */ > > > + return 0; > > > > What happens if a malicious guest uses 0 as its actual CR3 value? > > When l1 guest uses hap and l2 guest uses shadow paging, this code path > never runs. > When l1 guest uses hap and l2 guest uses hap, this is can only be the > guest's hostcr3. > > So your question is what happens if a malicious guest uses 0 to point to > its nested page table ? Yes. > I think, this guest crashes sooner or later anyway. It might, but in the meantime we've missed a flush. If we fail to properly flush a nested P2M when we should, the L2 guest gets to use stale entries to write to memory it shouldn't see. My point is that you can't use cr3==0 as an indicator that this slot isn't in use because the guest can cause it to be true. Maybe -1 would be a better choice, since that's not a valid cr3 value. > > > + /* All p2m's are or were in use. We know the least recent used one. > > > + * Destroy and re-initialize it. > > > + */ > > > + for (i = 0; i < MAX_NESTEDP2M; i++) { > > > + p2m = p2m_getlru_nestedp2m(d, NULL); > > > + rv = p2m_flush_locked(p2m); > > > > Is this enough? If this p2m might be in host_vmcb->h_cr3 somewhere on > > another vcpu, then you need to make sure that vcpu gets reset not to use > > it any more. > > There are three possibilities: > An other vcpu is in VMRUN emulation before a nestedp2m is assigned. > In this case, it will get a new nestedp2m as it won't find its 'old' > nestedp2m anymore. > > An other vcpu is in VMRUN emulation after a nestedp2m is assigned. > It will VMEXIT with a nested page fault. Why? > An other vcpu already running l2 guest. > It will VMEXIT with a nested page fault immediately. Hmm. It will exit for the TLB shootdown IPI, but I think you need to clear vcpu_nestedhvm(v).nh_p2m on the other vcpu to make sure it doesn't re-enter with the p2m you've just recycled. Cheers, Tim. -- Tim Deegan <Tim.Deegan@xxxxxxxxxx> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.