[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.





On 4/7/2017 9:56 PM, George Dunlap wrote:
On 07/04/17 12:31, Jan Beulich wrote:
On 07.04.17 at 12:55, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
On 4/7/2017 6:26 PM, Jan Beulich wrote:
On 07.04.17 at 11:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
On 4/7/2017 5:40 PM, Jan Beulich wrote:
On 06.04.17 at 17:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
@@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
        if ( is_epte_valid(ept_entry) )
        {
            if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
+             p2m_check_changeable(ept_entry->sa_p2mt) )
I think the distinction between these two is rather arbitrary, and I
also think this is part of the problem above: Distinguishing log-dirty
from ram-rw requires auxiliary data to be consulted. The same
ought to apply to ioreq-server, and then there wouldn't be a need
to have two p2m_*_changeable() flavors.
Well, I think we have also discussed this quite long ago, here is the link.
https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
That was a different discussion, namely not considering this ...

Of course the subsequent use p2m_is_logdirty_range() may then
need amending.

In the end it looks like you have the inverse problem here compared
to above: You should return ram-rw when the reset was already
initiated. At least that's how I would see the logic to match up with
the log-dirty handling (where the _effective_ rather than the last
stored type is being returned).
... at all.
Sorry I still do not get it.
Leaving ioreq-server out of the picture, what the function returns
to the caller is the type as it would be if a re-calculation would have
been done on the entry, even if that hasn't happened yet (the
function here shouldn't change any entries after all). I think we
want the same behavior here for what have been ioreq-server
entries (and which would become ram-rw after the next re-calc).
How about something like the attached? (In-lined for convenience.)

  -George

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

This patch disallows mapping of an ioreq server, when there's still
p2m_ioreq_server entry left, in case another mapping occurs right after
the current one being unmapped, releases its lock, with p2m table not
synced yet.

This patch also disallows live migration, when there's remaining
p2m_ioreq_server entry in p2m table. The core reason is our current
implementation of p2m_change_entry_type_global() lacks information
to resync p2m_ioreq_server entries correctly if global_logdirty is
on.

We still need to handle other recalculations, however; which means
that when doing a recalculation, if the current type is
p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
not.  If it is, we leave it as type p2m_ioreq_server; if not, we reset
it to p2m_ram as appropriate.

To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
it for all type recalculations (both in p2m-pt.c and p2m-ept.c).

Signed-off-by: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx>
Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
---
  xen/arch/x86/hvm/ioreq.c  |  8 ++++++
  xen/arch/x86/mm/hap/hap.c |  9 ++++++
  xen/arch/x86/mm/p2m-ept.c | 73
+++++++++++++++++++++++++++++------------------
  xen/arch/x86/mm/p2m-pt.c  | 58 ++++++++++++++++++++++++-------------
  xen/arch/x86/mm/p2m.c     |  9 ++++++
  xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
  6 files changed, 135 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 5bf3b6d..07a6c26 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
*d, ioservid_t id,

      spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);

+    if ( rc == 0 && flags == 0 )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
      return rc;
  }

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..4b591fe 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
   */
  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
  {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /*
+     * Refuse to turn on global log-dirty mode if
+     * there are outstanding p2m_ioreq_server pages.
+     */
+    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+        return -EBUSY;
+
      /* turn on PG_log_dirty bit in paging mode */
      paging_lock(d);
      d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..9f41067 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
unsigned long gfn)
              {
                  for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
                  {
+                    p2m_type_t nt;
                      e = atomic_read_ept_entry(&epte[i]);
                      if ( e.emt == MTRR_NUM_TYPES )
                          e.emt = 0;
@@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
*p2m, unsigned long gfn)
                                                 _mfn(e.mfn), 0, &ipat,
                                                 e.sa_p2mt ==
p2m_mmio_direct);
                      e.ipat = ipat;
-                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
-                    {
-                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
i, gfn + i)
-                                     ? p2m_ram_logdirty : p2m_ram_rw;
+                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
+ i);
+                    if ( nt != e.sa_p2mt ) {
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                         {
+                             ASSERT(p2m->ioreq.entry_count > 0);
+                             p2m->ioreq.entry_count--;
+                         }
+
+                         e.sa_p2mt = nt;
                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
e.access);
                      }
                      e.recalc = 0;
@@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
*p2m, unsigned long gfn)

                  if ( recalc && p2m_is_changeable(e.sa_p2mt) )
                  {
-                     unsigned long mask = ~0UL << (level *
EPT_TABLE_ORDER);
-
-                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
-                                                    gfn | ~mask) )
-                     {
-                     case 0:
-                          e.sa_p2mt = p2m_ram_rw;
-                          e.recalc = 0;
-                          break;
-                     case 1:
-                          e.sa_p2mt = p2m_ram_logdirty;
-                          e.recalc = 0;
-                          break;
-                     default: /* Force split. */
-                          emt = -1;
-                          break;
-                     }
+                    unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
+
+                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
+                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
+                                                   gfn | ~mask) )
+                    {
+                    case 0:
+                        e.sa_p2mt = p2m_ram_rw;
+                        e.recalc = 0;
+                        break;
+                    case 1:
+                        e.sa_p2mt = p2m_ram_logdirty;
+                        e.recalc = 0;
+                        break;
+                    default: /* Force split. */
+                        emt = -1;
+                        break;
+                    }


So here I guess only the "ASSERT(e.sa_p2mt != p2m_ioreq_server);" is new, right? Thanks George, it's great to know your helper routine works also for the NPT.

Yu

                  }
                  if ( unlikely(emt < 0) )
                  {
@@ -816,6 +823,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long
gfn, mfn_t mfn,
          new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                      old_entry.suppress_ve : 1;

+    /*
+     * p2m_ioreq_server is only used for 4K pages, so the
+     * count shall only happen on ept page table entries.
+     */
+    if ( p2mt == p2m_ioreq_server )
+    {
+        ASSERT(i == 0);
+        p2m->ioreq.entry_count++;
+    }
+
+    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
+    {
+        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
+        p2m->ioreq.entry_count--;
+    }
+
      rc = atomic_write_ept_entry(ept_entry, new_entry, target);
      if ( unlikely(rc) )
          old_entry.epte = 0;
@@ -964,12 +987,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,

      if ( is_epte_valid(ept_entry) )
      {
-        if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
-            *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
-                                                      : p2m_ram_rw;
-        else
-            *t = ept_entry->sa_p2mt;
+        *t = p2m_recalc_type(recalc || ept_entry->recalc,
+                             ept_entry->sa_p2mt, p2m, gfn);
          *a = ept_entry->access;
          if ( sve )
              *sve = ept_entry->suppress_ve;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index c0055f3..e481f53 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -389,6 +389,7 @@ static int do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
          {
              unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);

+            ASSERT(p2m_flags_to_type(l1e_get_flags(*pent)) !=
p2m_ioreq_server);
              if ( !needs_recalc(l1, *pent) ||

!p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(*pent))) ||
                   p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) >= 0 )
@@ -436,17 +437,18 @@ static int do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
           needs_recalc(l1, *pent) )
      {
          l1_pgentry_t e = *pent;
+        p2m_type_t ot, nt;
+        unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);

          if ( !valid_recalc(l1, e) )
              P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                        p2m->domain->domain_id, gfn, level);
-        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
+        ot = p2m_flags_to_type(l1e_get_flags(e));
+        nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask);
+        if ( nt != ot )
          {
-            unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
-            p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask,
gfn | ~mask)
-                              ? p2m_ram_logdirty : p2m_ram_rw;
              unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+            unsigned long flags = p2m_type_to_flags(p2m, nt,
                                                      _mfn(mfn), level);

              if ( level )
@@ -460,9 +462,17 @@ static int do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
                       mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
                  flags |= _PAGE_PSE;
              }
+
+            if ( ot == p2m_ioreq_server )
+            {
+                ASSERT(p2m->ioreq.entry_count > 0);
+                ASSERT(level == 0);
+                p2m->ioreq.entry_count--;
+            }
+
              e = l1e_from_pfn(mfn, flags);
              p2m_add_iommu_flags(&e, level,
-                                (p2mt == p2m_ram_rw)
+                                (nt == p2m_ram_rw)
                                  ? IOMMUF_readable|IOMMUF_writable : 0);
              ASSERT(!needs_recalc(l1, e));
          }
@@ -606,6 +616,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,

      if ( page_order == PAGE_ORDER_4K )
      {
+        p2m_type_t p2mt_old;
+
          rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
                              L2_PAGETABLE_SHIFT - PAGE_SHIFT,
                              L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
@@ -629,6 +641,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,
          if ( entry_content.l1 != 0 )
              p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);

+        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
+
+        /*
+         * p2m_ioreq_server is only used for 4K pages, so
+         * the count shall only be performed for level 1 entries.
+         */
+        if ( p2mt == p2m_ioreq_server )
+            p2m->ioreq.entry_count++;
+
+        if ( p2mt_old == p2m_ioreq_server )
+        {
+            ASSERT(p2m->ioreq.entry_count > 0);
+            p2m->ioreq.entry_count--;
+        }
+
          /* level 1 entry */
          p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
          /* NB: paging_write_p2m_entry() handles tlb flushes properly */
@@ -726,15 +753,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,
      return rc;
  }

-static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
-                                     struct p2m_domain *p2m, unsigned
long gfn)
-{
-    if ( !recalc || !p2m_is_changeable(t) )
-        return t;
-    return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
-                                                : p2m_ram_rw;
-}
-
  static mfn_t
  p2m_pt_get_entry(struct p2m_domain *p2m, unsigned long gfn,
                   p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
@@ -820,8 +838,8 @@ pod_retry_l3:
              mfn = _mfn(l3e_get_pfn(*l3e) +
                         l2_table_offset(addr) * L1_PAGETABLE_ENTRIES +
                         l1_table_offset(addr));
-            *t = recalc_type(recalc || _needs_recalc(flags),
-                             p2m_flags_to_type(flags), p2m, gfn);
+            *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                                 p2m_flags_to_type(flags), p2m, gfn);
              unmap_domain_page(l3e);

              ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -859,8 +877,8 @@ pod_retry_l2:
      if ( flags & _PAGE_PSE )
      {
          mfn = _mfn(l2e_get_pfn(*l2e) + l1_table_offset(addr));
-        *t = recalc_type(recalc || _needs_recalc(flags),
-                         p2m_flags_to_type(flags), p2m, gfn);
+        *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                             p2m_flags_to_type(flags), p2m, gfn);
          unmap_domain_page(l2e);

          ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -896,7 +914,7 @@ pod_retry_l1:
          return INVALID_MFN;
      }
      mfn = _mfn(l1e_get_pfn(*l1e));
-    *t = recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
+    *t = p2m_recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
      unmap_domain_page(l1e);

      ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b84add0..4169d18 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
          if ( p2m->ioreq.server != NULL )
              goto out;

+        /*
+         * It is possible that an ioreq server has just been unmapped,
+         * released the spin lock, with some p2m_ioreq_server entries
+         * in p2m table remained. We shall refuse another ioreq server
+         * mapping request in such case.
+         */
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            goto out;
+
          p2m->ioreq.server = s;
          p2m->ioreq.flags = flags;
      }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 4521620..8077d0d 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;

  /* Types that can be subject to bulk transitions. */
  #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
-                              | p2m_to_mask(p2m_ram_logdirty) )
+                              | p2m_to_mask(p2m_ram_logdirty) \
+                              | p2m_to_mask(p2m_ioreq_server) )
+
+#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))

  #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))

@@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
  #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
  #define p2m_is_discard_write(_t) (p2m_to_mask(_t) &
P2M_DISCARD_WRITE_TYPES)
  #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
+#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
  #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
  #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
  /* Grant types are *not* considered valid, because they can be
@@ -349,6 +353,7 @@ struct p2m_domain {
            * are to be emulated by an ioreq server.
            */
           unsigned int flags;
+         unsigned long entry_count;
       } ioreq;
  };

@@ -744,6 +749,25 @@ static inline p2m_type_t p2m_flags_to_type(unsigned
long flags)
      return (flags >> 12) & 0x7f;
  }

+static inline p2m_type_t p2m_recalc_type_range(bool_t recalc, p2m_type_t t,
+                                               struct p2m_domain *p2m,
+                                               unsigned long gfn_start,
+                                               unsigned long gfn_end)
+{
+    if ( !recalc || !p2m_is_changeable(t) )
+        return t;
+    if ( t == p2m_ioreq_server && p2m->ioreq.server != NULL )
+        return t;
+    return p2m_is_logdirty_range(p2m, gfn_start, gfn_end) ?
p2m_ram_logdirty
+        : p2m_ram_rw;
+}
+
+static inline p2m_type_t p2m_recalc_type(bool_t recalc, p2m_type_t t,
+                                         struct p2m_domain *p2m,
unsigned long gfn)
+{
+    return p2m_recalc_type_range(recalc, t, p2m, gfn, gfn);
+}
+
  int p2m_pt_handle_deferred_changes(uint64_t gpa);

  /*


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.