[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors.

> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: Thursday, December 03, 2015 7:19 PM
> On 03/12/15 08:50, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Thursday, December 03, 2015 4:18 PM
> >>
> >>>>> On 03.12.15 at 03:40, <kevin.tian@xxxxxxxxx> wrote:
> >>> Just confirmed internally with HW team. On SNB 4KB cache is always
> >>> used regardless of 4KB/2MB/1GB mapping. There'd be another reason
> >>> for this 40% drop observation...
> >> So when they stated that the 4k TLB gets always used, did they at
> >> least provide some thoughts on what else might be causing this
> >> severe a performance impact? Without them helping we're left
> >> guessing...
> >>
> > Unfortunately no clear answer...
> http://networkbuilders.intel.com/docs/Network_Builders_RA_vBRAS_Final.pdf
> Page 42: "The IOTLB on the previous generation Intel Xeon Processor
> E5-2690 does not natively support huge pages (it emulates them using 4K
> pages)."
> And Figure 51 on Page 43
> The "emulates them using 4K pages" probably means that the IOTLB is
> flushed and filled with 512 adjacent 4k mappings.
> Citrix's measurements back up the findings in that paper, and also show
> that performance is better when using plain 4k mappings as opposed to
> emulated 2M mappings.

Thanks for the information. I'll forward it to HW team.

If above interpretation is correct (which also matches my thought), then
for two options you listed earlier:

> This leaves two options
> 1) 2M mappings are entirely uncached
> 2) 2M mappings are shattered to 4K mappings and cached

> The fact there is a 40% performance reduction suggests 1 rather than 2.

looks 2) is suggested rather than 1). There are two further options:

2.1) 2M mappings are shattered to 512 adjacent 4k mappings which are all
2.2) Only the 4k mapping out of 2M mapping is cached for the page being 

for 2.1), as IOTLB entries are limited, it may cause unnecessary IOTLB
entry flushes and thus incurs more page walking overhead to fill-in.

for 2.2), I can't think out a reason to cause performance drop.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.