[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] memory performance 20% degradation in DomU -- Sisu

Hi, Dario:

Thanks for the reply.

The CPU i am using is i7 X 980 @ 3.33 GHz,
each core has dedicated L1(32K data, 32K inst) and L2 (256K unified) cache, all 6 cores share a 12MB L3 cache.
I pinned Dom-0 to core 0, and Dom-U to core 1.

The program I used is attached. It takes one input parameter as the data array size (in KB).

It can be divided into following steps:
1. init the data array
2. divided the array by cache line size (on my machine, it is 64B), then random the first element on each cache line;
3. read each cache line once to warm up cache again
4. read it a second time, record the time for this round.
5. print out the time spent in step 4 (total time and per cache line time).

The randomization is done to defeat cache pre-fetch. And since each accessed data are 64B apart, there should be no two accessed data on the same cache line.

I compile it use: g++ -O0 cache_latency_size_boxplot.cc -o cache_latency_size

The script I used to run the experiment is also attached. Basically it try different array size, each for 1000 times.

For the throughput experiment, I used the ramspeed to measure memory throughput.Â
I used v2.6, single core version. The command I used is ./ramspeed -b 3 (for int) and ./ramspeed -b 6 (for float).

Thanks very much!


On Wed, Mar 12, 2014 at 3:55 AM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
On Tue, 2014-03-11 at 15:21 -0500, Sisu Xi wrote:
> by the way, since the same DomU image can get better results on
> another hardware machine, we first assume there are some interference
> from Dom-0.
Are you able to share the source of the test program, so that we can try
to reproduce what you're seeing?

> However, when I run the same program in Dom-0, the results looks very
> good, almost the same as native case, just a few out liars. Which
> means the interference form Dom-0 is not causing trouble for dom-0,
> but can interfere with cache program in Dom-U? Is this assumption
> valid?
I'm shooting a bit in the dark, but:
Â- what is Dom0 doing while the DomU is running the workload?
Â- to what pCPUs are you pinning Dom0's and DomU's vCPUs? Do they
 Âshare any level of the cache hierarchy?


<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Sisu Xi, PhD Candidate

Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

Attachment: cache_latency_size_boxplot.cc
Description: Text Data

Attachment: cache_latency_1_measure_size.sh
Description: Bourne shell script

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.