[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH 13/13] Nested Virtualiztion: hap-on-hap



At 15:01 +0100 on 09 Sep (1284044463), Christoph Egger wrote:
> > This comment is confusing to me.  The nested guest gpa isn't a virtual
> > address, and certainly not the same as an L1-guest VA.  Can you reword
> > it, maybe just to say what the call to nestedhvm_hap_nested_page_fault()
> > is going to do?
> 
> New wording is:
> 
>   /* The vcpu is in guest mdoe and the l1 guest
>    * uses hap. That means 'gpa' is in l2 guest
>    * physical adderss space.
>    * Fix the nested p2m or inject nested page fault
>    * into l1 guest if not fixable. The algorithm is
>    * the same as for shadow paging.
>    */
> 
> Is this fine with you ?

Sure, that's better. 

> > > -            gfn = paging_gva_to_gfn(curr, addr, &pfec);
> > > +            gfn = paging_p2m_ga_to_gfn(curr, p2m, mode, cr3, addr,
> > > &pfec);
> >
> > Do other callers of paging_gva_to_gfn need to handle nested-npt mode as
> > well?
> 
> If the other callers can be reached when the vcpu is in guest mode, then yes.

It looks like most of them can. 

> > If so, would it be better to update paging_gva_to_gfn() itself?
> 
> This is definitely easier but the call to p2m_get_p2m() might become be very
> expensive and should be cached by passing through per function argument.

I'd rather it was cached somewhere in the per-vcpu state if
p2m_get_p2m() is expensive, and not pass any more arguments.  Callers of
paging_gva_to_gfn() shouldn't need to know about the internals of the
p2m code - they should just be able to pass a VA and get a (L1) gfn.

> > > +     p2m = vcpu_nestedhvm(v).nh_p2m;
> > > +     if (p2m == NULL)
> > > +             /* p2m has either been invalidated or not yet assigned. */
> > > +             return;
> > > +
> > > +     cpu_set(v->processor, p2m->p2m_dirty_cpumask);
> >
> > Is this arch-specific code?  (and likewise the cpu_clear that follows)
> 
> No. I don't see how.

It's only used by NPT-on-NPT.  IIRC the EPT-on-EPT code uses a soft-TLB
model so it won't need this cpumask.  But it's not a big deal.

> > This function looks like it duplicates a lot of logic from an old
> > version of the normal p2m next-level function.
> 
> Yes, it is redundant. The point is that the *_write_p2m_entry() functions
> are not nested virtualization aware. I am having troubles with understanding
> the monitor p2m tables.
> 
> > Would it be easy to merge them?
> 
> Maybe, maybe not. I can answer the question when I understand the use of the
> monitor table and how they work.

OK.  I'll wait for the next rev then. 

> > > +int
> > > +nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t L2_gpa)
> > > +{
> > > +    int rv;
> > > +    paddr_t L1_gpa, L0_gpa;
> > > +    struct domain *d = v->domain;
> > > +    struct p2m_domain *p2m, *nested_p2m;
> > > +
> > > +    p2m = p2m_get_hostp2m(d); /* L0 p2m */
> > > +    nested_p2m = p2m_get_nestedp2m(v, vcpu_nestedhvm(v).nh_vm_hostcr3);
> > > +
> > > +    /* walk the L1 P2M table, note we have to pass p2m
> > > +     * and not nested_p2m here or we fail the walk forever,
> > > +     * otherwise. */
> >
> > Can you explain that more fully?  It looks like you're walking the L0
> > table to translate an L2 gfn into an L1 gfn, which surely can't be right.
> 
> The l1 guest hostcr3 is in l1 guest physical address space. To walk the
> l1 guest's page table I need to translate it into host physical address space
> by walking the l0 p2m table.
> 
> > > +    rv = nestedhap_walk_L1_p2m(v, p2m, L2_gpa, &L1_gpa);

OK, I understand now, thanks.

> > > +static int
> > > +p2m_flush_locked(struct p2m_domain *p2m)
> > > +{
> > > +    struct page_info * (*alloc)(struct p2m_domain *);
> > > +    void (*free)(struct p2m_domain *, struct page_info *);
> > > +
> > > +    alloc = p2m->alloc_page;
> > > +    free = p2m->free_page;
> > > +
> > > +    if (p2m->cr3 == 0)
> > > +        /* Microoptimisation: p2m is already empty.
> > > +         * => about 0.3% speedup of overall system performance.
> > > +         */
> > > +        return 0;
> >
> > What happens if a malicious guest uses 0 as its actual CR3 value?
> 
> When l1 guest uses hap and l2 guest uses shadow paging, this code path
> never runs.
> When l1 guest uses hap and l2 guest uses hap, this is can only be the
> guest's hostcr3.
> 
> So your question is what happens if a malicious guest uses 0 to point to
> its nested page table ?

Yes.

> I think, this guest crashes sooner or later anyway.

It might, but in the meantime we've missed a flush.  If we fail to
properly flush a nested P2M when we should, the L2 guest gets to use
stale entries to write to memory it shouldn't see.

My point is that you can't use cr3==0 as an indicator that this slot
isn't in use because the guest can cause it to be true.  Maybe -1 would
be a better choice, since that's not a valid cr3 value.

> > > +    /* All p2m's are or were in use. We know the least recent used one.
> > > +     * Destroy and re-initialize it.
> > > +     */
> > > +    for (i = 0; i < MAX_NESTEDP2M; i++) {
> > > +        p2m = p2m_getlru_nestedp2m(d, NULL);
> > > +        rv = p2m_flush_locked(p2m);
> >
> > Is this enough?  If this p2m might be in host_vmcb->h_cr3 somewhere on
> > another vcpu, then you need to make sure that vcpu gets reset not to use
> > it any more.
> 
> There are three possibilities:
> An other vcpu is in VMRUN emulation before a nestedp2m is assigned.
>    In this case, it will get a new nestedp2m as it won't find its 'old'
>    nestedp2m anymore.
> 
> An other vcpu is in VMRUN emulation after a nestedp2m is assigned.
>    It will VMEXIT with a nested page fault.

Why?

> An other vcpu already running l2 guest.
>    It will VMEXIT with a nested page fault immediately.

Hmm.  It will exit for the TLB shootdown IPI, but I think you need to
clear vcpu_nestedhvm(v).nh_p2m on the other vcpu to make sure it doesn't
re-enter with the p2m you've just recycled.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.