[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/6] VMX: correct PDPTE load checks



>>> On 28.08.18 at 15:12, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 19/07/18 11:49, Jan Beulich wrote:
>> Checking the low 5 bits of CR3 is not the job of vmx_load_pdptrs().
>> Instead it should #GP upon bad PDPTE values, rather than causing a VM
>> entry failure.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1361,20 +1361,18 @@ static void vmx_set_interrupt_shadow(str
>>      __vmwrite(GUEST_INTERRUPTIBILITY_INFO, intr_shadow);
>>  }
>>  
>> -static void vmx_load_pdptrs(struct vcpu *v)
>> +static bool vmx_load_pdptrs(struct vcpu *v)
>>  {
>>      unsigned long cr3 = v->arch.hvm_vcpu.guest_cr[3];
>> -    uint64_t *guest_pdptes;
>> +    uint64_t *guest_pdptes, valid_mask;
>>      struct page_info *page;
>>      p2m_type_t p2mt;
>>      char *p;
>> +    unsigned int i;
>>  
>>      /* EPT needs to load PDPTRS into VMCS for PAE. */
>>      if ( !hvm_pae_enabled(v) || (v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
>> -        return;
>> -
>> -    if ( (cr3 & 0x1fUL) && !hvm_pcid_enabled(v) )
>> -        goto crash;
>> +        return true;
>>  
>>      page = get_page_from_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt, 
>> P2M_UNSHARE);
> 
> cr3 needs truncating to 32 bits before doing this.  The upper bits of
> cr3 can remain set after transitioning away from long mode, which will
> cause this emulation to do the wrong thing.

In which case the easiest thing would be to change the variable's
type. I've done this, albeit it's sort of orthogonal to what the patch
has been meaning to do till now.

>> -    /*
>> -     * We do not check the PDPTRs for validity. The CPU will do this during
>> -     * vm entry, and we can handle the failure there and crash the guest.
>> -     * The only thing we could do better here is #GP instead.
>> -     */
>> +    valid_mask = ((1ULL << v->domain->arch.cpuid->extd.maxphysaddr) - 1) &
>> +                 (PAGE_MASK | _PAGE_AVAIL | _PAGE_PRESENT);
> 
> How did you come across this list?  The only valid bits are Present, PWT
> and PCD, while the upper nibble of control bits is documented as ignored
> rather than reserved.

As to the upper nibble (_PAGE_AVAIL I assume, which are only 3 bits) -
that's what the mask sets. As the name says, it's a collection of bits
which are valid to be set here (which necessarily includes ignored bits;
we're only interested in its inverted value for masking purposes).

As to PWT and PCD, the set above was based on the foot note to
section "Checks on Guest Page-Directory-Pointer-Table Entries"
saying

"This implies that (1) bits 11:9 in each PDPTE are ignored; and (2) if bit 0
 (present) is clear in one of the PDPTEs, bits 63:1 of that PDPTE are
 ignored."

But I now realize that I've wrongly implied this to be a complete
description.

>> +    for ( i = 0; i < 4; ++i )
>> +        if ( (guest_pdptes[i] & _PAGE_PRESENT) &&
>> +             (guest_pdptes[i] & ~valid_mask) )
>> +        {
>> +            if ( v == current )
>> +                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> 
> The function is void for the same reason why this isn't correct.  We are
> in the hvm_update_* path rather than the set_* path, which is beyond the
> point of being able to unwind (and why I haven't yet got around to
> fixing this function).
> 
> The only way I can see of fixing this is plumbing everything into a new
> paging_set_cr3() callback, which can return X86EMUL_EXCEPTION and fail
> the hvm_set_cr3() call.

Hmm, I see. I'll see about doing this. But why would this be a paging
hook? It looks to me as if this wants to be another entry in hvm_funcs,
also taking into consideration that on AMD/NPT the PDPTRs are
documented not to work the "normal" way. If it was a paging one,
where would you call it from?

I'm also anticipating a further complication: If we split the function,
how would we guarantee that the read/check one has always been
invoked before the load one? After all we'd have to latch the values
read from memory in the former in order to use the validated values
in the latter. This is in particular because hvm_update_guest_cr3()
looks to be called from a number of paths not having come though
hvm_set_cr3() (which anyway itself invokes paging_update_cr3()).
A pretty good (or bad) example is hvmop_flush_tlb_all(), in
particular considering the comment there.

Bottom line - I think the patch here is an improvement in its own right,
and further improvements should perhaps be done separately. In
the meantime I wonder whether the failure path in
vmx_update_guest_cr() could attempt some rollback (the prior CR3
value must have been fine after all, leaving aside the fact that the
it may have been loaded with CR4.PAE still clear, which is a rather
fragile way of changing paging modes anyway).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.