[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about high CPU load during iperf ethernet testing



On Wed, 24 Sep 2014, Iurii Konovalenko wrote:
> Hi, Stefano!
> Thank you for your reply!
> 
> On Tue, Sep 23, 2014 at 7:41 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Mon, 22 Sep 2014, Iurii Konovalenko wrote:
> >> Hello, all!
> >>
> >> I am running iperf ethernet tests on DRA7XX_EVM board (OMAP5).
> >> Xen version is 4.4.
> >> I run only Linux (kernel 3.8) as Dom0, no other active domains (For clear 
> >> tests results I decided not to start DomU).
> >> iperf server is started on host, iperf client is started on board with 
> >> command line "iperf -c 192.168.2.10 -w 256k -m
> >> -f M -d -t 60".
> >
> > Just to double check: you are running the iperf test in Dom0, correct?
> 
> Yes, iperf is running in Dom0.
> 
> >> During test I studied CPU load with top tool on Dom0, and saw, that one 
> >> VCPU is totally loaded, spending about 50% in
> >> software IRQs, and 50% in system.
> >> Running the same test on clear Linux without Xen, I saw that CPU load is 
> >> about 2-4%.
> >>
> >> I decided to debug a bit, so I used "({register uint64_t _r; asm 
> >> volatile("mrrc " "p15, 0, %0, %H0, c14" ";" : "=r"
> >> (_r)); _r; })" command to read timer counter before and after operations I 
> >> want to test.
> >>
> >> In such way I've found, that most time of CPU is spent in functions 
> >> enable_irq/disable_irq_nosync and
> >> spin_lock_irqsave/spin_unlock_irqrestore (mostly in "mrs    %0, cpsr    @ 
> >> arch_local_irq_save"/"msr    cpsr_c, %0    @
> >> local_irq_restore"). When running without Xen it should not take so much 
> >> time.
> >
> > There is nothing Xen specific in the Linux ARM implementation of
> > spin_lock_irqsave/spin_unlock_irqrestore and
> > enable_irq/disable_irq_nosync.
> >
> 
> That is strange, because my explorations show a lot of time is spent
> there, for example in spin_unlock_irqrestore (mostly in  mrs
> instuction) about 20%, when running in Dom0.

Unless you are doing something wrong in your measurements, if you really
narrowed it down to one instruction then I would try to do the same on a
different SoC of another vendor to see if it is actually an hardware issue.


> >> So, could anyone explain me some questions:
> >> 1. Is it normal behaviour?
> >
> > No, it is not normal.
> > Assuming that you assign all the memory to Dom0 and as many vcpu as
> > physical cpus on your platform then you should get the same numbers as
> > native.
> 
> OK, so I might do something wrong.
> 
> >> 2. Does hypervisor trap cpsr register? I suppose, that hypervisor trap 
> >> access to cpsr register, that leads to
> >> additional overhead, but I can't find place in sources where it happens.
> >
> > We don't trap cpsr.
> 
> It is strange, because it was only one my assumption, where time can be spent.
> So could you please advise where to go to understand the reason of
> such high VCPU load?

I don't know. When you say that arch_local_irq_save is the one taking
all the time, do you actually have something like:

time1 = read CNTVCT;
arch_local_irq_save();
time2 = read CNTVCT;
printk(time2-time1);

in your code?



> Best regards.
> 
> Iurii Konovalenko | Senior Software Engineer
> GlobalLogic
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.