[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables

To: "Wu, Feng" <feng.wu@xxxxxxxxx>, "Han, Huaitong" <huaitong.han@xxxxxxxxx>, "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "george.dunlap@xxxxxxxxxxxxx" <george.dunlap@xxxxxxxxxxxxx>, "ian.jackson@xxxxxxxxxxxxx" <ian.jackson@xxxxxxxxxxxxx>, "stefano.stabellini@xxxxxxxxxxxxx" <stefano.stabellini@xxxxxxxxxxxxx>, "ian.campbell@xxxxxxxxxx" <ian.campbell@xxxxxxxxxx>, "wei.liu2@xxxxxxxxxx" <wei.liu2@xxxxxxxxxx>, "keir@xxxxxxx" <keir@xxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxx>
Date: Wed, 16 Dec 2015 15:36:31 +0000
Cc: Tim Deegan <tim@xxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Wed, 16 Dec 2015 15:36:43 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

[Adding Tim, the previous mm maintainer]

On 11/12/15 09:16, Wu, Feng wrote:
>>> +{
>>> +    void *xsave_addr;
>>> +    unsigned int pkru = 0;
>>> +    bool_t pkru_ad, pkru_wd;
>>> +
>>> +    bool_t uf = !!(pfec & PFEC_user_mode);
>>> +    bool_t wf = !!(pfec & PFEC_write_access);
>>> +    bool_t ff = !!(pfec & PFEC_insn_fetch);
>>> +    bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
>>> +    bool_t pkuf  = !!(pfec & PFEC_prot_key);
>>
>> So I'm just wondering out loud here -- is there actually any situation
>> in which we would want guest_walk_tables to act differently than the
>> real hardware?
>>
>> That is, is there actually any situation where, pku is enabled, the vcpu
>> is in long mode, PFEC_write_access and/or PFEC_page_present is set, and
>> the pkey is non-zero, that we want guest_walk_tables() to only check the
>> write-protect bit for the pte, and not also check the pkru?
>>
>> Because if not, it seems like it would be much more robust to simply
>> *always* check for pkru_ad if PFEC_page_present is set, and for pkru_wd
>> if PFEC_write_access is set.
> 
> guest_walk_tables() is also used by shadow code, though we don't
> plan to support pkeys for shadow now, however, in that case, the 'pfec'
> is generated by hardware, and the pkuf bit may be 0 or 1 depending
> on the real page fault. If we unconditionally check pkeys in
> guest_walk_tables(), it is not a good ideas for someone who may
> want to implement the pkeys for shadow in future, since we only
> need to check pkeys when the pkuf is set in pfec.

So you'll have to forgive me for being a bit slow here -- I stepped away
from the mm code for a while, and a lot has changed since I agreed to
become mm maintainer earlier this year.

So here is what I see in the tree; please correct me if I missed
something important.

The function guest_walk_tables():
 1. Accepts a pfec argument which is meant to describe the *access type*
that is happening
 2. Returns a value with the "wrong" flags (i.e., flags which needed to
be set that were missing, or flags that needed to be clear that were set).
 3. It checks reserved bits in the pagetable regardless of whether
PFEC_reserved_bit is set in the caller, and returns _PAGE_INVALID_BITS
if so.
 4. If PFEC_insn_fetch is set, it will only check for nx and smep if
those features are enabled for the guest.

The main callers are the various gva_to_gfn(), which accept a *pointer*
to a pfec argument.  This pointer is actually bidirectional: its value
passed to guest_walk_tables() to determine what access types are
checked; what is returned is meant to be a pfec value which can be
passed to a guest as part of a fault (and is by several callers).  But
it may also return the Xen-internal flags, PFEC_page_{paged,shared}.
And, importantly, it modifies the pfec value based on the bits that are
returned from guest_walk_tables(): It will clear PFEC_present if
guest_walk_tables() returns _PAGE_PRESENT, and it will set
PFEC_reserved_bit if guest_walk_tables() returns _PAGE_INVALID_BIT.

The next logical level up are the
hvm_{copy,fetch}_{to,from}_guest_virt(), which also take a pfec
argument: but really the only purpose of the pfec argument is to allow
the caller to add the PFEC_user_mode flag; the other access-type flags
are set automatically by the functions themselves (e.g., "to" sets
PFEC_write_access, "fetch" sets PFEC_insn_fetch, &c).  Several callers
set PFEC_present as well, but callers set any other bits.

(hvm_fetch_from_guest_virt() seems to only set PFEC_insn_fetch if nx or
smep are enabled in the guest.  This seems inconsistent to me with the
treatment of PFEC_reserved_bit: it seems like
hvm_fetch_from_guest_virt() should always pass in PFEC_insn_fetch,
particularly as guest_walk_tables() will already gate the checks based
on whether nx or smep is enabled in the guest.  Tim, you know of any
reason for this?)

So there seems to me to be no reason to pass PFEC_prot_key into
guest_walk_tables() or gva_to_gfn().  The pfec value passed *into* those
should simply indicate the type of memory access being done: present,
write, instruction fetch, user.

With your current series, guest_walk_tables() already checks for pkeys
being enabled in the guest before checking for them in the pagetables.
For shadow mode, these will be false, and so no checks will be done.  If
anyone ever implements pkeys for shadow mode, then these will be
enabled, and the checks will be done, without any intervention on the
part of the caller.

The only thing that's missing is:
1. for gva_to_gfn() to check for _PAGE_PKEY_BITS on the return from
guest_walk_tables(), and set PFEC_prot_key if so
2. to set PFEC_insn_fetch when pkeys are enabled in
hvm_fetch_from_guest_virt(), so that guest_walk_tables() knows that it's
an instruction fetch.

>> Then in patch 8, you wouldn't need to go around all the __hvm_copy
>> functions adding in PFEC_prot; instead, you'd just need to add
>> PFEC_insn_fetch to the "fetch" (as is already done for SMEP and NX), and
>> you'd be done.
> 
> Pkeys has no business with instructions fetch, why do we need to add
> PFEC_insn_fetch?

PFEC_insn_fetch passed *into* guest_walk_tables() indicates that the
access is an instruction fetch.  Your code treats instruction fetches
differently than normal reads, yes?

At the moment PFEC_insn_fetch is only set in hvm_fetch_from_guest_virt()
if hvm_nx_enabled() or hvm_smep_enabled() are true.  Which means that if
you *don't* have nx or smep enabled, then the patch series as is will
fault on instruction fetches when it shouldn't.  (I don't remember
anyone mentioning nx or smep being enabled as a prerequisite for pkeys.)

(And it seems to me that it would be better to always set
PFEC_insn_fetch in hvm_fetch_from_guest_virt(), but that's a bit beyond
the scope of what you're trying to accomplish here.)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
  - From: Han, Huaitong
- Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
  - From: Tim Deegan

References:
- [Xen-devel] [V3 PATCH 0/9] x86/hvm: pkeys, add memory protection-key support
  - From: Huaitong Han
- [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
  - From: Huaitong Han
- Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
  - From: George Dunlap
- Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
  - From: Wu, Feng

Prev by Date: Re: [Xen-devel] [PATCH] tools: allow configure time choice of libexec subdirectory.
Next by Date: Re: [Xen-devel] [PATCH RFC XEN v1 08/14] xen: arm: Save and restore arch timer state.
Previous by thread: Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
Next by thread: Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.