[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] XSA 154 and ISA region (640K -> 1MB) WB cache instead of UC



Hey Jan, et. al.,

One of the interesting things about XSA 154 fix ("x86: enforce consistent
cachability of MMIO mappings") is that when certain applications (mcelog)
are trying to map /dev/mmap and lurk in ISA regions - we get:

[   49.399053] WARNING: CPU: 0 PID: 2471 at arch/x86/mm/pat.c:913 
untrack_pfn+0x93/0xc0()
[   49.399055] Modules linked in: bnx2fc fcoe libfcoe libfc 8021q mrp garp stp 
llc bonding dm_multipath vfat fat iTCO_wdt iTCO_vendor_support pcspkr 
ipmi_devintf ipmi_si ipmi_msghandler sb_edac edac_core i2c_i801 i2c_core 
lpc_ich mfd_core shpchp ioatdma sg ext4 jbd2 mbcache sr_mod cdrom sd_mod 
usb_storage ahci libahci megaraid_sas qla2xxx scsi_transport_fc crc32c_intel 
be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 libiscsi_tcp 
qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi ixgbe dca ptp pps_core 
vxlan udp_tunnel ip6_udp_tunnel mdio dm_mirror dm_region_hash dm_log dm_mod
[   49.399131] CPU: 0 PID: 2471 Comm: mcelog Not tainted 4.1
[   49.399134] Hardware name: Oracle Corporation SUN SERVER X4-2       
/ASSY,MB,X4-2, 1U      , BIOS 25030100 04/15/2015
[   49.399138]  0000000000000000 ffff880074673c28 ffffffff816c66f0 
0000000000000000
[   49.399143]  0000000000000391 ffff880074673c68 ffffffff81084745 
ffff880074673c78
[   49.399148]  ffff88014b625db0 0000000000000000 ffff880074673d58 
00007f39290ab000
[   49.399152] Call Trace:
[   49.399166]  [<ffffffff816c66f0>] dump_stack+0x63/0x83
[   49.399175]  [<ffffffff81084745>] warn_slowpath_common+0x95/0xe0
[   49.399180]  [<ffffffff810847aa>] warn_slowpath_null+0x1a/0x20
[   49.399183]  [<ffffffff810725f3>] untrack_pfn+0x93/0xc0
[   49.399190]  [<ffffffff811b90f9>] unmap_single_vma+0xa9/0x100
[   49.399194]  [<ffffffff811b9644>] unmap_vmas+0x54/0xa0
[   49.399199]  [<ffffffff811bf0da>] exit_mmap+0x9a/0x150
[   49.399204]  [<ffffffff810825d3>] mmput+0x73/0x110
[   49.399208]  [<ffffffff81082775>] dup_mm+0x105/0x110
[   49.399213]  [<ffffffff81083b1d>] copy_process+0x11ed/0x1240
[   49.399218]  [<ffffffff81084009>] do_fork+0x79/0x280
[   49.399226]  [<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180
[   49.399231]  [<ffffffff81084226>] SyS_clone+0x16/0x20
[   49.399235]  [<ffffffff816cb3ee>] system_call_fastpath+0x12/0x71
[   49.399239] ---[ end trace a61cd3d271a53a54 ]---

The reason is that Linux kernel assumes that the range from 640KB -> 1MB can
be mapped as write-back (see is_new_memtype_allowed and 
x86_platform.is_untracked_pat_range).
But we enforce the uncached mode and Linux complains.

With the mmio-relax=1, Linux gets its way and is happy.

With the patch below:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 70a38c1..e5ff5a5 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -288,6 +287,12 @@ static void __init xen_banner(void)
               version >> 16, version & 0xffff, extra.extraversion,
               xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " 
(preserve-AD)" : "");
 }
+
+static bool xen_ignore(u64 s, u64 e)
+{
+       return false;
+}
+
 /* Check if running on Xen version (major, minor) or later */
 bool
 xen_running_on_version_or_later(unsigned int major, unsigned int minor)
@@ -1563,7 +1570,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
                x86_init.resources.memory_setup = xen_memory_setup;
        x86_init.oem.arch_setup = xen_arch_setup;
        x86_init.oem.banner = xen_banner;
-
+       x86_platform.is_untracked_pat_range = xen_ignore;
        xen_init_time_ops();
 
        /*

Things work much better - as we don't treat the 640KB->1MB region specially.

Anyhow what I am wondering:

 a) Should we add a edge case in the hypervisor to allow multiple mappings
   for this region? I am thinking no.. but it sounds like mapping ISA region
   as WB is safe even in baremetal?

 b) Or would it be better to let Linux do its thing and treat 640KB->1MB
   as uncached instead of writeback?

   Looking at the kernel it assumes that WB is ok for 640KB->1MB.
   The comment says:
   " /* Low ISA region is always mapped WB in page table. No need to track *"

   which is probably true on baremetal. But with Xen PV:

 856         /*                                                                 
     
 857          * In domU, the ISA region is normal, usable memory, but we        
     
 858          * reserve ISA memory anyway because too many things poke          
     
 859          * about in there.                                                 
     
 860          */                                                                
     
 861         e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - 
ISA_START_ADDRESS, 
 862                         E820_RESERVED);                                 

   which would imply we don't have any page table mappings.

   And then the quick fix I provided above looks like the right solution?


CC-ing Boris, Daniel, Juergen, Steve, and Chuck.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.