[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen-4.1-testing] Backport per-device vector map patches to xen 4.1.3



# HG changeset patch
# User Wei Wang2 <wei.wang2@xxxxxxx>
# Date 1319728476 -3600
# Node ID 9a38e30e54599b958382ad19a70a48567a38f816
# Parent  81e39a4978ea14e0180ce4692b889083c279773f
Backport per-device vector map patches to xen 4.1.3

Recently we found an issue in xen 4.1. Under heavy I/O stress such as
running bonnie++, Dom0 would lost its hard disk with lots of I/O
errors. We found that some PCI-E devices was using the same vector as
SMBus on AMD platforms and George' patch set that enables per-device
vector map can fix this problem.

23752 xen: Infrastructure to allow irqs to share vector maps
23753 xen: Option to allow per-device vector maps for MSI IRQs
23754 xen: AMD IOMMU: Automatically enable per-device vector maps
23786 x86: Fix up irq vector map logic
23812 xen: Add global irq_vector_map option
23899 AMD-IOMMU: remove dead variable references

From: Wei Wang2 <wei.wang2@xxxxxxx>
Committed-by: Keir Fraser <keir@xxxxxxx>

AMD-IOMMU: remove dead variable references

These got orphaned up by recent changes.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
Acked-by: Keir Fraser <keir@xxxxxxx>
xen-unstable changeset:   23899:a99d75671a91
xen-unstable date:        Tue Oct 04 14:11:56 2011 +0200

docs: Fix 'make docs'

Signed-off-by: Keir Fraser <keir@xxxxxxx>
xen-unstable changeset:   23819:5fe770c8a8a3
xen-unstable date:        Tue Sep 06 15:49:40 2011 +0100

xen: Add global irq_vector_map option, set if using AMD global intremap tables

As mentioned in previous changesets, AMD IOMMU interrupt
remapping tables only look at the vector, not the destination
id of an interrupt.  This means that all IRQs going through
the same interrupt remapping table need to *not* share vectors.

The irq "vector map" functionality was originally introduced
after a patch which disabled global AMD IOMMUs entirely.  That
patch has since been reverted, meaning that AMD intremap tables
can either be per-device or global.

This patch therefore introduces a global irq vector map option,
and enables it if we're using an AMD IOMMU with a global
interrupt remapping table.

This patch removes the "irq-perdev-vector-map" boolean
command-line optino and replaces it with "irq_vector_map",
which can have one of three values: none, global, or per-device.

Setting the irq_vector_map to any value will override the
default that the AMD code sets.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
xen-unstable changeset:   23812:32814ad7458d
xen-unstable date:        Mon Sep 05 15:00:15 2011 +0100

x86: Fix up irq vector map logic

We need to make sure that cfg->used_vector is only cleared once;
otherwise there may be a race condition that allows the same vector to
be assigned twice, defeating the whole purpose of the map.

This makes two changes:
* __clear_irq_vector() only clears the vector if the irq is not being
moved
* smp_iqr_move_cleanup_interrupt() only clears used_vector if this
is the last place it's being used (move_cleanup_count==0 after
decrement).

Also make use of asserts more consistent, to catch this kind of logic
bug in the future.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
xen-unstable changeset:   23786:3a05da2dc7c0
xen-unstable date:        Mon Aug 22 16:15:33 2011 +0100

xen: AMD IOMMU: Automatically enable per-device vector maps

Automatically enable per-device vector maps when using IOMMU,
unless disabled specifically by an IOMMU parameter.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
xen-unstable changeset:   23754:fa4e2ca9ecff
xen-unstable date:        Tue Jul 26 18:37:32 2011 +0100

xen: Option to allow per-device vector maps for MSI IRQs

Add a vector-map to pci_dev, and add an option to point MSI-related
IRQs to the vector-map of the device.

This prevents irqs from the same device from being assigned
the same vector on different pcpus.  This is required for systems
using an AMD IOMMU, since the intremap tables on AMD only look at
vector, and not destination ID.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
xen-unstable changeset:   23753:2e0cf9428554
xen-unstable date:        Tue Jul 26 18:37:16 2011 +0100

xen: Infrastructure to allow irqs to share vector maps

Laying the groundwork for per-device vector maps.  This generic
code allows any irq to point to a vector map; all irqs sharing the
same vector map will avoid sharing vectors.

Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
xen-unstable changeset:   23752:ef9ed3d2aa87
xen-unstable date:        Tue Jul 26 18:36:58 2011 +0100
---


diff -r 81e39a4978ea -r 9a38e30e5459 docs/src/user.tex
--- a/docs/src/user.tex Mon Oct 24 18:03:35 2011 +0100
+++ b/docs/src/user.tex Thu Oct 27 16:14:36 2011 +0100
@@ -4197,6 +4197,10 @@
 \item [ vcpu\_migration\_delay=$<$minimum\_time$>$] Set minimum time of 
   vcpu migration in microseconds (default 0). This parameter avoids agressive
   vcpu migration. For example, the linux kernel uses 0.5ms by default.
+\item [ irq\_vector\_map=xxx ] Enable irq vector non-sharing maps.  Setting 
'global' 
+  will ensure that no  IRQs will share vectors.  Setting 'per-device' will 
ensure 
+  that no IRQs from the same device will share vectors.  Setting to 'none' will
+  disable it entirely, overriding any defaults the IOMMU code may set.
 \end{description}
 
 In addition, the following options may be specified on the Xen command
diff -r 81e39a4978ea -r 9a38e30e5459 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c    Mon Oct 24 18:03:35 2011 +0100
+++ b/xen/arch/x86/io_apic.c    Thu Oct 27 16:14:36 2011 +0100
@@ -548,6 +548,13 @@
         }
         __get_cpu_var(vector_irq)[vector] = -1;
         cfg->move_cleanup_count--;
+
+        if ( cfg->move_cleanup_count == 0 
+             &&  cfg->used_vectors )
+        {
+            ASSERT(test_bit(vector, cfg->used_vectors));
+            clear_bit(vector, cfg->used_vectors);
+        }
 unlock:
         spin_unlock(&desc->lock);
     }
diff -r 81e39a4978ea -r 9a38e30e5459 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c        Mon Oct 24 18:03:35 2011 +0100
+++ b/xen/arch/x86/irq.c        Thu Oct 27 16:14:36 2011 +0100
@@ -24,6 +24,8 @@
 #include <asm/mach-generic/mach_apic.h>
 #include <public/physdev.h>
 
+static void parse_irq_vector_map_param(char *s);
+
 /* opt_noirqbalance: If true, software IRQ balancing/affinity is disabled. */
 bool_t __read_mostly opt_noirqbalance = 0;
 boolean_param("noirqbalance", opt_noirqbalance);
@@ -32,6 +34,12 @@
 unsigned int __read_mostly nr_irqs;
 integer_param("nr_irqs", nr_irqs);
 
+/* This default may be changed by the AMD IOMMU code */
+int __read_mostly opt_irq_vector_map = OPT_IRQ_VECTOR_MAP_DEFAULT;
+custom_param("irq_vector_map", parse_irq_vector_map_param);
+
+vmask_t global_used_vector_map;
+
 u8 __read_mostly *irq_vector;
 struct irq_desc __read_mostly *irq_desc = NULL;
 
@@ -60,6 +68,26 @@
 static unsigned int __read_mostly irq_ratelimit_threshold = 10000;
 integer_param("irq_ratelimit", irq_ratelimit_threshold);
 
+static void __init parse_irq_vector_map_param(char *s)
+{
+    char *ss;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        if ( !strcmp(s, "none"))
+            opt_irq_vector_map=OPT_IRQ_VECTOR_MAP_NONE;
+        else if ( !strcmp(s, "global"))
+            opt_irq_vector_map=OPT_IRQ_VECTOR_MAP_GLOBAL;
+        else if ( !strcmp(s, "per-device"))
+            opt_irq_vector_map=OPT_IRQ_VECTOR_MAP_PERDEV;
+
+        s = ss + 1;
+    } while ( ss );
+}
+
 /* Must be called when irq disabled */
 void lock_vector_lock(void)
 {
@@ -94,6 +122,11 @@
         per_cpu(vector_irq, cpu)[vector] = irq;
     cfg->vector = vector;
     cfg->cpu_mask = online_mask;
+    if ( cfg->used_vectors )
+    {
+        ASSERT(!test_bit(vector, cfg->used_vectors));
+        set_bit(vector, cfg->used_vectors);
+    }
     irq_status[irq] = IRQ_USED;
     if (IO_APIC_IRQ(irq))
         irq_vector[irq] = vector;
@@ -159,6 +192,7 @@
     desc->depth   = 1;
     desc->msi_desc = NULL;
     desc->handler = &no_irq_type;
+    desc->chip_data->used_vectors=NULL;
     cpus_setall(desc->affinity);
     spin_unlock_irqrestore(&desc->lock, flags);
 
@@ -191,6 +225,7 @@
 
     if (likely(!cfg->move_in_progress))
         return;
+
     cpus_and(tmp_mask, cfg->old_cpu_mask, cpu_online_map);
     for_each_cpu_mask(cpu, tmp_mask) {
         for (vector = FIRST_DYNAMIC_VECTOR; vector <= LAST_DYNAMIC_VECTOR;
@@ -202,6 +237,12 @@
         }
      }
 
+    if ( cfg->used_vectors )
+    {
+        ASSERT(test_bit(vector, cfg->used_vectors));
+        clear_bit(vector, cfg->used_vectors);
+    }
+
     cfg->move_in_progress = 0;
 }
 
@@ -261,6 +302,7 @@
     cfg->vector = IRQ_VECTOR_UNASSIGNED;
     cpus_clear(cfg->cpu_mask);
     cpus_clear(cfg->old_cpu_mask);
+    cfg->used_vectors = NULL;
 }
 
 int init_irq_data(void)
@@ -330,6 +372,41 @@
     end_none
 };
 
+static vmask_t *irq_get_used_vector_mask(int irq)
+{
+    vmask_t *ret = NULL;
+
+    if ( opt_irq_vector_map == OPT_IRQ_VECTOR_MAP_GLOBAL )
+    {
+        struct irq_desc *desc = irq_to_desc(irq);
+
+        ret = &global_used_vector_map;
+
+        if ( desc->chip_data->used_vectors )
+        {
+            printk(XENLOG_INFO "%s: Strange, unassigned irq %d already has 
used_vectors!\n",
+                   __func__, irq);
+        }
+        else
+        {
+            int vector;
+            
+            vector = irq_to_vector(irq);
+            if ( vector > 0 )
+            {
+                printk(XENLOG_INFO "%s: Strange, irq %d already assigned 
vector %d!\n",
+                       __func__, irq, vector);
+                
+                ASSERT(!test_bit(vector, ret));
+
+                set_bit(vector, ret);
+            }
+        }
+    }
+
+    return ret;
+}
+
 int __assign_irq_vector(int irq, struct irq_cfg *cfg, const cpumask_t *mask)
 {
     /*
@@ -348,6 +425,7 @@
     int cpu, err;
     unsigned long flags;
     cpumask_t tmp_mask;
+    vmask_t *irq_used_vectors = NULL;
 
     old_vector = irq_to_vector(irq);
     if (old_vector) {
@@ -362,6 +440,17 @@
         return -EAGAIN;
 
     err = -ENOSPC;
+
+    /* This is the only place normal IRQs are ever marked
+     * as "in use".  If they're not in use yet, check to see
+     * if we need to assign a global vector mask. */
+    if ( irq_status[irq] == IRQ_USED )
+    {
+        irq_used_vectors = cfg->used_vectors;
+    }
+    else
+        irq_used_vectors = irq_get_used_vector_mask(irq);
+
     for_each_cpu_mask(cpu, *mask) {
         int new_cpu;
         int vector, offset;
@@ -387,6 +476,10 @@
         if (test_bit(vector, used_vectors))
             goto next;
 
+        if (irq_used_vectors
+            && test_bit(vector, irq_used_vectors) )
+            goto next;
+
         for_each_cpu_mask(new_cpu, tmp_mask)
             if (per_cpu(vector_irq, new_cpu)[vector] != -1)
                 goto next;
@@ -404,8 +497,20 @@
         cpus_copy(cfg->cpu_mask, tmp_mask);
 
         irq_status[irq] = IRQ_USED;
-            if (IO_APIC_IRQ(irq))
-                    irq_vector[irq] = vector;
+        ASSERT((cfg->used_vectors == NULL)
+               || (cfg->used_vectors == irq_used_vectors));
+        cfg->used_vectors = irq_used_vectors;
+
+        if (IO_APIC_IRQ(irq))
+            irq_vector[irq] = vector;
+
+        if ( cfg->used_vectors )
+        {
+            ASSERT(!test_bit(vector, cfg->used_vectors));
+
+            set_bit(vector, cfg->used_vectors);
+        }
+
         err = 0;
         local_irq_restore(flags);
         break;
@@ -1494,7 +1599,7 @@
 
     if ( !IS_PRIV(current->domain) &&
          !(IS_PRIV_FOR(current->domain, d) &&
-          irq_access_permitted(current->domain, pirq)))
+           irq_access_permitted(current->domain, pirq)))
         return -EPERM;
 
     if ( pirq < 0 || pirq >= d->nr_pirqs || irq < 0 || irq >= nr_irqs )
@@ -1542,8 +1647,21 @@
 
         if ( desc->handler != &no_irq_type )
             dprintk(XENLOG_G_ERR, "dom%d: irq %d in use\n",
-              d->domain_id, irq);
+                    d->domain_id, irq);
         desc->handler = &pci_msi_type;
+        if ( opt_irq_vector_map == OPT_IRQ_VECTOR_MAP_PERDEV
+             && !desc->chip_data->used_vectors )
+        {
+            desc->chip_data->used_vectors = &pdev->info.used_vectors;
+            if ( desc->chip_data->vector != IRQ_VECTOR_UNASSIGNED )
+            {
+                int vector = desc->chip_data->vector;
+                ASSERT(!test_bit(vector, desc->chip_data->used_vectors));
+
+                set_bit(vector, desc->chip_data->used_vectors);
+            }
+        }
+
         d->arch.pirq_irq[pirq] = irq;
         d->arch.irq_pirq[irq] = pirq;
         setup_msi_irq(pdev, msi_desc, irq);
@@ -1554,9 +1672,12 @@
         d->arch.pirq_irq[pirq] = irq;
         d->arch.irq_pirq[irq] = pirq;
         spin_unlock_irqrestore(&desc->lock, flags);
+
+        if ( opt_irq_vector_map == OPT_IRQ_VECTOR_MAP_PERDEV )
+            printk(XENLOG_INFO "Per-device vector maps for GSIs not 
implemented yet.\n");
     }
 
- done:
+done:
     return ret;
 }
 
diff -r 81e39a4978ea -r 9a38e30e5459 xen/drivers/passthrough/amd/pci_amd_iommu.c
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c       Mon Oct 24 18:03:35 
2011 +0100
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c       Thu Oct 27 16:14:36 
2011 +0100
@@ -166,6 +166,35 @@
         return -ENODEV;
     }
 
+    /*
+     * AMD IOMMUs don't distinguish between vectors destined for
+     * different cpus when doing interrupt remapping.  This means
+     * that interrupts going through the same intremap table
+     * can't share the same vector.
+     *
+     * If irq_vector_map isn't specified, choose a sensible default:
+     * - If we're using per-device interemap tables, per-device
+     *   vector non-sharing maps
+     * - If we're using a global interemap table, global vector
+     *   non-sharing map
+     */
+    if ( opt_irq_vector_map == OPT_IRQ_VECTOR_MAP_DEFAULT )
+    {
+        if ( amd_iommu_perdev_intremap )
+        {
+            printk("AMD-Vi: Enabling per-device vector maps\n");
+            opt_irq_vector_map = OPT_IRQ_VECTOR_MAP_PERDEV;
+        }
+        else
+        {
+            printk("AMD-Vi: Enabling global vector map\n");
+            opt_irq_vector_map = OPT_IRQ_VECTOR_MAP_GLOBAL;
+        }
+    }
+    else
+    {
+        printk("AMD-Vi: Not overriding irq_vector_map setting\n");
+    }
     return scan_pci_devices();
 }
 
diff -r 81e39a4978ea -r 9a38e30e5459 xen/include/asm-x86/irq.h
--- a/xen/include/asm-x86/irq.h Mon Oct 24 18:03:35 2011 +0100
+++ b/xen/include/asm-x86/irq.h Thu Oct 27 16:14:36 2011 +0100
@@ -23,11 +23,16 @@
 #define irq_to_desc(irq)    (&irq_desc[irq])
 #define irq_cfg(irq)        (&irq_cfg[irq])
 
+typedef struct {
+    DECLARE_BITMAP(_bits,NR_VECTORS);
+} vmask_t;
+
 struct irq_cfg {
         int  vector;
         cpumask_t cpu_mask;
         cpumask_t old_cpu_mask;
         unsigned move_cleanup_count;
+        vmask_t *used_vectors;
         u8 move_in_progress : 1;
 };
 
@@ -40,6 +45,13 @@
 
 extern bool_t opt_noirqbalance;
 
+#define OPT_IRQ_VECTOR_MAP_DEFAULT 0 /* Do the default thing  */
+#define OPT_IRQ_VECTOR_MAP_NONE    1 /* None */ 
+#define OPT_IRQ_VECTOR_MAP_GLOBAL  2 /* One global vector map (no vector 
sharing) */ 
+#define OPT_IRQ_VECTOR_MAP_PERDEV  3 /* Per-device vetor map (no vector 
sharing w/in a device) */
+
+extern int opt_irq_vector_map;
+
 /*
  * Per-cpu current frame pointer - the location of the last exception frame on
  * the stack
diff -r 81e39a4978ea -r 9a38e30e5459 xen/include/xen/pci.h
--- a/xen/include/xen/pci.h     Mon Oct 24 18:03:35 2011 +0100
+++ b/xen/include/xen/pci.h     Thu Oct 27 16:14:36 2011 +0100
@@ -11,6 +11,7 @@
 #include <xen/types.h>
 #include <xen/list.h>
 #include <xen/spinlock.h>
+#include <xen/irq.h>
 
 /*
  * The PCI interface treats multi-function devices as independent
@@ -38,6 +39,7 @@
         u8 bus;
         u8 devfn;
     } physfn;
+   vmask_t used_vectors; 
 };
 
 struct pci_dev {

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.