[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about the CAT and CMT in Xen



2015-09-01 22:31 GMT-04:00 Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>:
> On Tue, Sep 01, 2015 at 09:51:57PM -0400, Meng Xu wrote:
>> Hi Andrew and Chao,
>>
>> [Important things go first] It turns out my machine (Intel E5-2618L
>> v3) does have CAT capability!
>> Xen gives the false alarm that my machine does not have it. This
>> should be a bug, IMO. :-)
>
> Even some Haswell Servers do support CAT, but it's usually
> model-specific and the feature is not enumerated in a standard way as
> listed in Intel SDM. Xen now however only takes care of the standard
> enumeration so your case is not detected.

Ah. I prefer not to upgrade the firmware... Last time when I upgrade
the firmware, I screw up the whole system. :-(

Is it possible to check if CAT is supported on a machine with another
way? Right now, I think you are checking some specific bits returned
by CPUID.

>
> This could be done by adding cpu model check in Xen code or even use
> updated firmware, which is I prefered.
>
>>
>> 2015-09-01 10:42 GMT-04:00 Meng Xu <xumengpanda@xxxxxxxxx>:
>> > 2015-09-01 10:30 GMT-04:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:
>> >> On 01/09/15 15:20, Meng Xu wrote:
>> >>> 2015-09-01 9:04 GMT-04:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:
>> >>>> On 01/09/15 13:55, Meng Xu wrote:
>> >>>>> 2015-09-01 1:47 GMT-04:00 Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>:
>> >>>>>> On Mon, Aug 31, 2015 at 04:09:31PM -0400, Meng Xu wrote:
>> >>>>>>> I looked into the xen/arch/x86/psr.c and found that the function
>> >>>>>>> cat_cpu_init() just returned without initializing the variable
>> >>>>>>> "cat_socket_enable".
>> >>>>>>>
>> >>>>>>> Both  !cpu_has(c, X86_FEATURE_CAT) and c->cpuid_level <
>> >>>>>>> PSR_CPUID_LEVEL_CAT are evaluated as 1 inside the function
>> >>>>>>> cat_cpu_init().
>>
>> I'm thinking this check could be wrong for Intel E5-2618L v3. It
>> should work on Chao's machine but not on mine. There should be a
>> better way to check this probably. :-)
>>
>> ---
>>
>> I used another way to check the CAT capability, as suggested by Priya
>> (cc.ed) from Intel.
>> I did the following steps as Priya suggested:
>> 1. Download msr-tools utility on your linux distribution to perform
>> msr read write operations./ if you already have it installed modprobe
>> msr
>> 2. rdmsr 0xc91
>> which returns 0xfffff
>> 3. wrmsr -p 1 0xc91 0xf
>> which does not return anything
>> 4. wrmsr -p 1 0xc8f 0x100000000
>> which does not return anything
>> 5. rdmsr 0xc91
>> which returns 0xf
>>
>> This shows that the CPU does have the MSRs that are used for CAT.
>>
>> I also run the CAT tools on Linux provided by Intel, which can be
>> downloaded at 
>> https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology.
>> It shows me that I can set up different cache partitions for different COS.
>> Basically, the pre-configured cache setting provided by the CAT tools
>> work perfectly on my machine. :-)
>>
>> ------
>> OK. That just check the registers are there and tools do not return
>> error. It may still not work, right? :-)
>> Well, I also did some performance evaluation by running a small simple
>> benchmark I wrote.
>>
>> The benchmark task sequentially access a 6MB array;
>>
>> I run the benchmark on core 0 in the following scenarios:
>>
>> Scenario 1): Core 0 is allocated for 8MB cache with CAT, the latency
>> of accessing the 6MB array is around 5.5M cycles;
>> Scenario 2): Core 0 is allocated for 4MB cache with CAT, the latency
>> of accessing the 6MB array is around 16.9M cycles.
>> The slowdown in scenario 2) is 16.9M / 5.5M ~=3x.
>>
>> ------ISSUES-------
>> I tried to run some noisy neighbors on another core to see how good
>> the LLC isolation CAT can provide, but found some *weird result*.
>> I run the benchmark task on core 0 and the noisy neighbor that access
>> 20MB array on core 1;
>> These two cores are configured to have *different* cache areas: core 0
>> has  8MB cache, core 1 has 4MB cache;
>> These two cores are in two isolated cpuset. No other tasks runs on
>> these two cores.
>> If I run the benchmark alone, the latency is around 5.5M cycles;
>> but if run the benchmark along with the noisy neighbor, the latency
>> *decreases* to 4.9M cycles.
>>
>> I double checked that the Turbo Boost is disabled by checking the MSR
>> value with the following command:
>>     rdmsr -pi 0x1a0 -f 38:38
>>     1=disabled
>>     0=enabled
>>     it returns 1.
>> I also disabled the cache prefetch in BIOS.
>>
>> Now I'm very confused. How come the latency decreases when a noisy
>> neighbor is running. It seems that the noisy neighbor may help some
>> hardware/software prefetcher to prefetch the data for the benchmark.
>> But right now, I couldn't think out any other prefetchers that may
>> cause this...
>> The benchmark and the noisy neighbor are independent and don't share
>> the array data.
>
> Did you reboot the machine between your two tests and are the two cores
> you used in the same socket?

I didn't reboot between these two test.

I'm using the same socket 0.

Thank you very much!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.