[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about high CPU load during iperf ethernet testing

Hi, Stefano!
Thank you for your reply!

On Tue, Sep 23, 2014 at 7:41 PM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Mon, 22 Sep 2014, Iurii Konovalenko wrote:
>> Hello, all!
>> I am running iperf ethernet tests on DRA7XX_EVM board (OMAP5).
>> Xen version is 4.4.
>> I run only Linux (kernel 3.8) as Dom0, no other active domains (For clear 
>> tests results I decided not to start DomU).
>> iperf server is started on host, iperf client is started on board with 
>> command line "iperf -c -w 256k -m
>> -f M -d -t 60".
> Just to double check: you are running the iperf test in Dom0, correct?

Yes, iperf is running in Dom0.

>> During test I studied CPU load with top tool on Dom0, and saw, that one VCPU 
>> is totally loaded, spending about 50% in
>> software IRQs, and 50% in system.
>> Running the same test on clear Linux without Xen, I saw that CPU load is 
>> about 2-4%.
>> I decided to debug a bit, so I used "({register uint64_t _r; asm 
>> volatile("mrrc " "p15, 0, %0, %H0, c14" ";" : "=r"
>> (_r)); _r; })" command to read timer counter before and after operations I 
>> want to test.
>> In such way I've found, that most time of CPU is spent in functions 
>> enable_irq/disable_irq_nosync and
>> spin_lock_irqsave/spin_unlock_irqrestore (mostly in "mrs    %0, cpsr    @ 
>> arch_local_irq_save"/"msr    cpsr_c, %0    @
>> local_irq_restore"). When running without Xen it should not take so much 
>> time.
> There is nothing Xen specific in the Linux ARM implementation of
> spin_lock_irqsave/spin_unlock_irqrestore and
> enable_irq/disable_irq_nosync.

That is strange, because my explorations show a lot of time is spent
there, for example in spin_unlock_irqrestore (mostly in  mrs
instuction) about 20%, when running in Dom0.

>> So, could anyone explain me some questions:
>> 1. Is it normal behaviour?
> No, it is not normal.
> Assuming that you assign all the memory to Dom0 and as many vcpu as
> physical cpus on your platform then you should get the same numbers as
> native.

OK, so I might do something wrong.

>> 2. Does hypervisor trap cpsr register? I suppose, that hypervisor trap 
>> access to cpsr register, that leads to
>> additional overhead, but I can't find place in sources where it happens.
> We don't trap cpsr.

It is strange, because it was only one my assumption, where time can be spent.
So could you please advise where to go to understand the reason of
such high VCPU load?

Best regards.

Iurii Konovalenko | Senior Software Engineer

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.