[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: "x86 don't change affinity with interrupt unmasked", APCI errors and assorted pci trouble



Friday, April 17, 2015, 1:43:32 PM, you wrote:

>>>> On 14.04.15 at 14:46, <linux@xxxxxxxxxxxxxx> wrote:
>> I just had a hunch .. could it be related to the kernel apci/irq refactoring
>> series of Jiang Liu, that already caused a lot of trouble in 3.17, 3.18 and 
>> 3.19
>> with Xen.  And yes that seems to be the case:
>> 
>> On Xen without "x86 don't change affinity with interrupt unmasked"
>> - 3.16 && 3.19 && 4.0 all work fine 
>> 
>> On Xen with "x86 don't change affinity with interrupt unmasked" 
>> - 3.16 (which is before that kernel refactoring series) works fine.
>> - 3.19, 4.0 both give the dom0 kernel hangs and the :
>>         (XEN) [2015-03-26 20:35:42.205] APIC error on CPU0: 00(40)
>>         (XEN) [2015-03-26 20:35:42.372] APIC error on CPU0: 40(40)
>> 
>> (haven't tested 3.17 and 3.18 because these have asorted problems due that 
>>  series that weren't fixed in time before stable updates ended.)
>> 
>> So it seems Jan's patch seems to interfere with that patch series.

> That's rather odd a finding - the patch in question in fact uncovered
> a bug introduced in 2ca9fbd739 ("AMD IOMMU: allocate IRTE entries
> instead of using a static mapping") in that IO-APIC RTE reads would
> unconditionally translate the data (i.e. regardless of whether the
> entry was already in translated format). The patch below fixes this
> for me - can you please give this a try too?

> Thanks, Jan

> --- unstable.orig/xen/drivers/passthrough/amd/iommu_intr.c
> +++ unstable/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -365,15 +365,17 @@ unsigned int amd_iommu_read_ioapic_from_
>      unsigned int apic, unsigned int reg)
>  {
>      unsigned int val = __io_apic_read(apic, reg);
> +    unsigned int pin = (reg - 0x10) / 2;
> +    unsigned int offset = ioapic_sbdf[IO_APIC_ID(apic)].pin_2_idx[pin];
>  
> -    if ( !(reg & 1) )
> +    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
>      {
> -        unsigned int offset = val & (INTREMAP_ENTRIES - 1);
>          u16 bdf = ioapic_sbdf[IO_APIC_ID(apic)].bdf;
>          u16 seg = ioapic_sbdf[IO_APIC_ID(apic)].seg;
>          u16 req_id = get_intremap_requestor_id(seg, bdf);
>          const u32 *entry = get_intremap_entry(seg, req_id, offset);
>  
> +        ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
>          val &= ~(INTREMAP_ENTRIES - 1);
>          val |= get_field_from_reg_u32(*entry,
>                                        INT_REMAP_ENTRY_INTTYPE_MASK,


Hmmm can this patch or tim's patch make andrew's patch ineffective ?

I now have applied:
Jan's:
diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index c1b76fb..879698e 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -365,15 +365,17 @@ unsigned int amd_iommu_read_ioapic_from_ire(
     unsigned int apic, unsigned int reg)
 {
     unsigned int val = __io_apic_read(apic, reg);
+    unsigned int pin = (reg - 0x10) / 2;
+    unsigned int offset = ioapic_sbdf[IO_APIC_ID(apic)].pin_2_idx[pin];

-    if ( !(reg & 1) )
+    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
     {
-        unsigned int offset = val & (INTREMAP_ENTRIES - 1);
         u16 bdf = ioapic_sbdf[IO_APIC_ID(apic)].bdf;
         u16 seg = ioapic_sbdf[IO_APIC_ID(apic)].seg;
         u16 req_id = get_intremap_requestor_id(seg, bdf);
         const u32 *entry = get_intremap_entry(seg, req_id, offset);

+        ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
         val |= get_field_from_reg_u32(*entry,
                                       INT_REMAP_ENTRY_INTTYPE_MASK,


Tim's:
@@ -529,10 +531,11 @@ int amd_iommu_msi_msg_update_ire(
     } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );

     if ( !rc )
+    {
         for ( i = 1; i < nr; ++i )
             msi_desc[i].remap_index = msi_desc->remap_index + i;
-
-    msg->data = data;
+        msg->data = data;
+    }
     return rc;
 }



Andrew's:
diff --git a/xen/drivers/passthrough/x86/iommu.c 
b/xen/drivers/passthrough/x86/iommu.c
index 9eb8d33..3aee00c 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -56,9 +56,9 @@ int arch_iommu_populate_page_table(struct domain *d)

     while ( !rc && (page = page_list_remove_head(&d->page_list)) )
     {
-        if ( has_hvm_container_domain(d) ||
-            (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
-        {
+        if ( (mfn_to_gmfn(d, page_to_mfn(page)) != INVALID_MFN) &&
+            (has_hvm_container_domain(d) ||
+             ((page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page)) 
)
             BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page))));
             rc = hd->platform_ops->map_page(
                 d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),



And i now have this one again (which Andrew's patch should prevent):
(XEN) [2015-04-17 15:00:55.954] Xen call trace:
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d080155f51>] 
iommu_pde_from_gfn+0x38/0x430
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d080156456>] 
amd_iommu_map_page+0x10d/0x4e6
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d08015a93d>] 
arch_iommu_populate_page_table+0x179/0x4d8
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d08014ca61>] 
iommu_do_pci_domctl+0x395/0x604
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d08014942b>] 
iommu_do_domctl+0x17/0x1a
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d080161f70>] 
arch_do_domctl+0x24ac/0x2724
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d080104ae8>] do_domctl+0x1a98/0x1df0
(XEN) [2015-04-17 15:00:55.954]    [<ffff82d0802349ab>] syscall_enter+0xeb/0x145
(XEN) [2015-04-17 15:00:55.954] 
(XEN) [2015-04-17 15:00:57.023] 
(XEN) [2015-04-17 15:00:57.032] ****************************************
(XEN) [2015-04-17 15:00:57.051] Panic on CPU 5:
(XEN) [2015-04-17 15:00:57.064] Xen BUG at iommu_map.c:455
(XEN) [2015-04-17 15:00:57.079] ****************************************


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.