[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 8/9] arm/mem_access: Add short-descriptor based gpt



Hi Sergej,

On 20/06/17 21:33, Sergej Proskurin wrote:
This commit adds functionality to walk the guest's page tables using the
short-descriptor translation table format for both ARMv7 and ARMv8. The
implementation is based on ARM DDI 0487B-a J1-6002 and ARM DDI 0406C-b
B3-1506.

Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx>
---
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Julien Grall <julien.grall@xxxxxxx>
---
v3: Move the implementation to ./xen/arch/arm/guest_copy.c.

    Use defines instead of hardcoded values.

    Cosmetic fixes & Added more coments.

v4: Adjusted the names of short-descriptor data-types.

    Adapt the function to the new parameter of type "struct vcpu *".

    Cosmetic fixes.
---
 xen/arch/arm/guest_walk.c        | 167 ++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-arm/guest_walk.h |  16 ++++
 2 files changed, 181 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
index c37c595157..9cc167af49 100644
--- a/xen/arch/arm/guest_walk.c
+++ b/xen/arch/arm/guest_walk.c
@@ -18,6 +18,7 @@
 #include <xen/sched.h>
 #include <xen/domain_page.h>
 #include <asm/guest_walk.h>
+#include <asm/short-desc.h>

 /*
  * The function guest_walk_sd translates a given GVA into an IPA using the
@@ -30,8 +31,170 @@ static int guest_walk_sd(const struct vcpu *v,
                          vaddr_t gva, paddr_t *ipa,
                          unsigned int *perms)
 {
-    /* Not implemented yet. */
-    return -EFAULT;
+    bool disabled = true;
+    int64_t ttbr;

TTBR is a register. It cannot be signed.

+    paddr_t mask, paddr;
+    short_desc_t pte, *table;
+    struct page_info *page;
+    register_t ttbcr = READ_SYSREG(TCR_EL1);
+    unsigned int level = 0, n = ttbcr & TTBCR_N_MASK;
+    struct domain *d = v->domain;
+
+    const paddr_t offsets[2] = {

Why are you using paddr_t here?

Looking at the code, I see very limited point of having the offsets array as you don't use a loop and also use each offset in a single place.

+        ((paddr_t)(gva >> 20) & ((1ULL << (12 - n)) - 1)),

Why the cast? After gva is a virtual address not physical one.

Also, this code is a bit cryptic to read. Can we have some documentation at least?

Furthermore, it would be clearer if you first mask then shift. As it helps to understand the code.

If you use GENMASK (or GENMASK_ULL if you don't extend GENMASK), this will make everything more obvious:

(gva & GENMASK(31 - n, 20)) >> 20

+        ((paddr_t)(gva >> 12) & ((1ULL << 8) - 1))

(gva & GENMASK(20, 12)) >> 12

+    };
+
+    mask = ((1ULL << BITS_PER_WORD) - 1) &
+           ~((1ULL << (BITS_PER_WORD - n)) - 1);

Same remark as on the previous patch for BITS_PER_WORD + you could use GENMASK.

+
+    if ( n == 0 || !(gva & mask) )
+    {
+        /* Use TTBR0 for GVA to IPA translation. */
+        ttbr = READ_SYSREG64(TTBR0_EL1);

Looking at the documentation. For short-descriptor, the register will be 32-bit. Whilst I can understand why you use READ_SYSREG64, you should at least document it. You also probably want to have ttbr uint32_t in that case.

+
+        /* If TTBCR.PD0 is set, translations using TTBR0 are disabled. */
+        disabled = ttbcr & TTBCR_PD0;
+    }
+    else
+    {
+        /* Use TTBR1 for GVA to IPA translation. */
+        ttbr = READ_SYSREG64(TTBR1_EL1);

Ditto.

+
+        /* If TTBCR.PD1 is set, translations using TTBR1 are disabled. */
+        disabled = ttbcr & TTBCR_PD1;
+
+        /*
+         * TTBR1 translation always works like n==0 TTBR0 translation (ARM DDI
+         * 0487B.a J1-6003).
+         */
+        n = 0;
+    }
+
+    if ( disabled )
+        return -EFAULT;
+
+    /*
+     * The address of the descriptor for the initial lookup has the following
+     * format: [ttbr<31:14-n>:gva<31-n:20>:00] (ARM DDI 0487B.a J1-6003). In
+     * this way, the first lookup level might comprise up to four consecutive
+     * pages. To avoid mapping all of the pages, we simply map the page that is
+     * needed by the first level translation by incorporating up to 2 MSBs of
+     * the GVA.
+     */
+    mask = (1ULL << (14 - n)) - 1;

mask = GENMASK(31, 14);

+    paddr = (ttbr & ~mask) | (offsets[level] << 2);
+
+    page = get_page_from_gfn(d, paddr_to_pfn(paddr), NULL, P2M_ALLOC);
+    if ( !page )
+        return -EFAULT;
+
+    table = __map_domain_page(page);
+
+    /*
+     * Consider that the first level address translation does not need to be
+     * page-aligned if n > 2.
+     */
+    if ( n > 2 )
+    {
+        /* Make sure that we consider the bits ttbr<12:14-n> if n > 2. */
+        mask = ((1ULL << 12) - 1) & ~((1ULL << (14 - n)) - 1);

GENMASK(12, 14 - n);

+        table = (short_desc_t *)((unsigned long)table | (unsigned long)(ttbr & 
mask));
+    }
+
+    /*
+     * As we have considered up to 2 MSBs of the GVA for mapping the first
+     * level translation table, we need to make sure that we limit the table
+     * offset that is is indexed by GVA<31-n:20> to max 10 bits to avoid
+     * exceeding the page size limit.
+     */
+    mask = ((1ULL << 10) - 1);
+    pte = table[offsets[level] & mask];

This looks quite complex for just reading a pte. I think it would be worth to leverage the vgic_guest_access_memory for that (same in LPAE). It would also add safety if the offsets the table is mistakenly computed (from the current code, I can't convince myself the offset will always be right).

+
+    unmap_domain_page(table);
+    put_page(page);
+
+    switch ( pte.walk.dt )
+    {
+    case L1DESC_INVALID:
+        return -EFAULT;
+
+    case L1DESC_PAGE_TABLE:
+        level++;
+
+        page = get_page_from_gfn(d, (pte.walk.base >> 2), NULL, P2M_ALLOC);
+        if ( !page )
+            return -EFAULT;
+
+        table = __map_domain_page(page);
+        /*
+         * The second level translation table is addressed by PTE<31:10>. Hence
+         * it does not need to be page aligned. Make sure that we also consider
+         * the bits PTE<11:10>.
+         */
+        table = (short_desc_t *)((unsigned long)table | ((pte.walk.base & 0x3) 
<< 10));
+
+        pte = table[offsets[level]];

Ditto.

+
+        unmap_domain_page(table);
+        put_page(page);
+
+        if ( pte.walk.dt == L2DESC_INVALID )
+            return -EFAULT;
+
+        if ( pte.pg.page ) /* Small page. */
+        {
+            mask = (1ULL << PAGE_SHIFT_4K) - 1;
+
+            *ipa = (pte.pg.base << PAGE_SHIFT_4K) | (gva & mask);
+
+            /* Set execute permissions associated with the small page. */
+            if ( !pte.pg.xn )
+                *perms = GV2M_EXEC;

As perms is modified in few places over the code. I would prefer if you first reset *perms at suitable value in guest_walk_tables and the use |= everywhere.

This would avoid any mistake in the future where the permission is not reported correctly.

+        }
+        else /* Large page. */
+        {
+            mask = (1ULL << PAGE_SHIFT_64K) - 1;
+
+            *ipa = (pte.lpg.base << PAGE_SHIFT_64K) | (gva & mask);
+
+            /* Set execute permissions associated with the large page. */
+            if ( !pte.lpg.xn )
+                *perms = GV2M_EXEC;

See above.

+        }
+
+        /* Set permissions so that the caller can check the flags by herself. 
*/
+        if ( !pte.pg.ro )
+            *perms |= GV2M_WRITE;
+
+        break;
+
+    case L1DESC_SECTION:
+    case L1DESC_SECTION_PXN:
+        if ( !pte.sec.supersec ) /* Section */
+        {
+            mask = (1ULL << L1DESC_SECTION_SHIFT) - 1;

It is quite strange that above you use plain PAGE_SHIFT_4K which is not really related to short-descriptor (you define it in lpae.h) and here you use short-descriptor name. Please try to stay consistent.

+
+            *ipa = (pte.sec.base << L1DESC_SECTION_SHIFT) | (gva & mask);
+        }
+        else /* Supersection */
+        {
+            mask = (1ULL << L1DESC_SUPERSECTION_SHIFT) - 1;
+
+            *ipa = gva & mask;
+            *ipa |= (paddr_t)(pte.supersec.base) << L1DESC_SUPERSECTION_SHIFT;
+            *ipa |= (paddr_t)(pte.supersec.extbase1) << 
L1DESC_SUPERSECTION_EXT_BASE1_SHIFT;
+            *ipa |= (paddr_t)(pte.supersec.extbase2) << 
L1DESC_SUPERSECTION_EXT_BASE2_SHIFT;
+        }
+
+        /* Set permissions so that the caller can check the flags by herself. 
*/
+        if ( !pte.sec.ro )
+            *perms = GV2M_WRITE;

My suggestion about perms would also avoid undefined permission if the region is read-only as none of the callers today will initialize perms.

+        if ( !pte.sec.xn )
+            *perms |= GV2M_EXEC;
+    }
+
+    return 0;
 }

 /*
diff --git a/xen/include/asm-arm/guest_walk.h b/xen/include/asm-arm/guest_walk.h
index 4ed8476e08..d0bed0c7a2 100644
--- a/xen/include/asm-arm/guest_walk.h
+++ b/xen/include/asm-arm/guest_walk.h
@@ -1,6 +1,22 @@
 #ifndef _XEN_GUEST_WALK_H
 #define _XEN_GUEST_WALK_H

+/* First level translation table descriptor types used by the AArch32
+ * short-descriptor translation table format. */
+#define L1DESC_INVALID                      (0)
+#define L1DESC_PAGE_TABLE                   (1)
+#define L1DESC_SECTION                      (2)
+#define L1DESC_SECTION_PXN                  (3)
+
+/* Defines for section and supersection shifts. */
+#define L1DESC_SECTION_SHIFT                (20)
+#define L1DESC_SUPERSECTION_SHIFT           (24)
+#define L1DESC_SUPERSECTION_EXT_BASE1_SHIFT (32)
+#define L1DESC_SUPERSECTION_EXT_BASE2_SHIFT (36)
+
+/* Second level translation table descriptor types. */
+#define L2DESC_INVALID                      (0)

I think those one belongs to short-desc.h.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.