[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Question about the CAT and CMT in Xen
2015-09-01 22:31 GMT-04:00 Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>: > On Tue, Sep 01, 2015 at 09:51:57PM -0400, Meng Xu wrote: >> Hi Andrew and Chao, >> >> [Important things go first] It turns out my machine (Intel E5-2618L >> v3) does have CAT capability! >> Xen gives the false alarm that my machine does not have it. This >> should be a bug, IMO. :-) > > Even some Haswell Servers do support CAT, but it's usually > model-specific and the feature is not enumerated in a standard way as > listed in Intel SDM. Xen now however only takes care of the standard > enumeration so your case is not detected. Ah. I prefer not to upgrade the firmware... Last time when I upgrade the firmware, I screw up the whole system. :-( Is it possible to check if CAT is supported on a machine with another way? Right now, I think you are checking some specific bits returned by CPUID. > > This could be done by adding cpu model check in Xen code or even use > updated firmware, which is I prefered. > >> >> 2015-09-01 10:42 GMT-04:00 Meng Xu <xumengpanda@xxxxxxxxx>: >> > 2015-09-01 10:30 GMT-04:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>: >> >> On 01/09/15 15:20, Meng Xu wrote: >> >>> 2015-09-01 9:04 GMT-04:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>: >> >>>> On 01/09/15 13:55, Meng Xu wrote: >> >>>>> 2015-09-01 1:47 GMT-04:00 Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>: >> >>>>>> On Mon, Aug 31, 2015 at 04:09:31PM -0400, Meng Xu wrote: >> >>>>>>> I looked into the xen/arch/x86/psr.c and found that the function >> >>>>>>> cat_cpu_init() just returned without initializing the variable >> >>>>>>> "cat_socket_enable". >> >>>>>>> >> >>>>>>> Both !cpu_has(c, X86_FEATURE_CAT) and c->cpuid_level < >> >>>>>>> PSR_CPUID_LEVEL_CAT are evaluated as 1 inside the function >> >>>>>>> cat_cpu_init(). >> >> I'm thinking this check could be wrong for Intel E5-2618L v3. It >> should work on Chao's machine but not on mine. There should be a >> better way to check this probably. :-) >> >> --- >> >> I used another way to check the CAT capability, as suggested by Priya >> (cc.ed) from Intel. >> I did the following steps as Priya suggested: >> 1. Download msr-tools utility on your linux distribution to perform >> msr read write operations./ if you already have it installed modprobe >> msr >> 2. rdmsr 0xc91 >> which returns 0xfffff >> 3. wrmsr -p 1 0xc91 0xf >> which does not return anything >> 4. wrmsr -p 1 0xc8f 0x100000000 >> which does not return anything >> 5. rdmsr 0xc91 >> which returns 0xf >> >> This shows that the CPU does have the MSRs that are used for CAT. >> >> I also run the CAT tools on Linux provided by Intel, which can be >> downloaded at >> https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology. >> It shows me that I can set up different cache partitions for different COS. >> Basically, the pre-configured cache setting provided by the CAT tools >> work perfectly on my machine. :-) >> >> ------ >> OK. That just check the registers are there and tools do not return >> error. It may still not work, right? :-) >> Well, I also did some performance evaluation by running a small simple >> benchmark I wrote. >> >> The benchmark task sequentially access a 6MB array; >> >> I run the benchmark on core 0 in the following scenarios: >> >> Scenario 1): Core 0 is allocated for 8MB cache with CAT, the latency >> of accessing the 6MB array is around 5.5M cycles; >> Scenario 2): Core 0 is allocated for 4MB cache with CAT, the latency >> of accessing the 6MB array is around 16.9M cycles. >> The slowdown in scenario 2) is 16.9M / 5.5M ~=3x. >> >> ------ISSUES------- >> I tried to run some noisy neighbors on another core to see how good >> the LLC isolation CAT can provide, but found some *weird result*. >> I run the benchmark task on core 0 and the noisy neighbor that access >> 20MB array on core 1; >> These two cores are configured to have *different* cache areas: core 0 >> has 8MB cache, core 1 has 4MB cache; >> These two cores are in two isolated cpuset. No other tasks runs on >> these two cores. >> If I run the benchmark alone, the latency is around 5.5M cycles; >> but if run the benchmark along with the noisy neighbor, the latency >> *decreases* to 4.9M cycles. >> >> I double checked that the Turbo Boost is disabled by checking the MSR >> value with the following command: >> rdmsr -pi 0x1a0 -f 38:38 >> 1=disabled >> 0=enabled >> it returns 1. >> I also disabled the cache prefetch in BIOS. >> >> Now I'm very confused. How come the latency decreases when a noisy >> neighbor is running. It seems that the noisy neighbor may help some >> hardware/software prefetcher to prefetch the data for the benchmark. >> But right now, I couldn't think out any other prefetchers that may >> cause this... >> The benchmark and the noisy neighbor are independent and don't share >> the array data. > > Did you reboot the machine between your two tests and are the two cores > you used in the same socket? I didn't reboot between these two test. I'm using the same socket 0. Thank you very much! Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |