Xen project Mailing List

Hi, Dario:

Thanks for the reply.

The CPU i am using is i7 X 980 @ 3.33 GHz,

each core has dedicated L1(32K data, 32K inst) and L2 (256K unified) cache, all 6 cores share a 12MB L3 cache.

I pinned Dom-0 to core 0, and Dom-U to core 1.

The program I used is attached. It takes one input parameter as the data array size (in KB).

It can be divided into following steps:

1. init the data array

2. divided the array by cache line size (on my machine, it is 64B), then random the first element on each cache line;

3. read each cache line once to warm up cache again

4. read it a second time, record the time for this round.

5. print out the time spent in step 4 (total time and per cache line time).

The randomization is done to defeat cache pre-fetch. And since each accessed data are 64B apart, there should be no two accessed data on the same cache line.

I compile it use: g++ -O0 cache_latency_size_boxplot.cc -o cache_latency_size

The script I used to run the experiment is also attached. Basically it try different array size, each for 1000 times.

For the throughput experiment, I used the ramspeed to measure memory throughput.Â

http://alasir.com/software/ramspeed/

I used v2.6, single core version. The command I used is ./ramspeed -b 3 (for int) and ./ramspeed -b 6 (for float).

Thanks very much!

Sisu

On Wed, Mar 12, 2014 at 3:55 AM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:

On Tue, 2014-03-11 at 15:21 -0500, Sisu Xi wrote:
> by the way, since the same DomU image can get better results on
> another hardware machine, we first assume there are some interference
> from Dom-0.
>

Are you able to share the source of the test program, so that we can try
to reproduce what you're seeing?

> However, when I run the same program in Dom-0, the results looks very
> good, almost the same as native case, just a few out liars. Which
> means the interference form Dom-0 is not causing trouble for dom-0,
> but can interfere with cache program in Dom-U? Is this assumption
> valid?
>

I'm shooting a bit in the dark, but:
Â- what is Dom0 doing while the DomU is running the workload?
Â- to what pCPUs are you pinning Dom0's and DomU's vCPUs? Do they
Â Âshare any level of the cache hierarchy?

Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

--
Sisu Xi, PhD Candidate

http://www.cse.wustl.edu/~xis/
Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

Re: [Xen-devel] memory performance 20% degradation in DomU -- Sisu