[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Question about high CPU load during iperf ethernet testing
On Wed, 24 Sep 2014, Iurii Konovalenko wrote: > Hi, Stefano! > Thank you for your reply! > > On Tue, Sep 23, 2014 at 7:41 PM, Stefano Stabellini > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > > On Mon, 22 Sep 2014, Iurii Konovalenko wrote: > >> Hello, all! > >> > >> I am running iperf ethernet tests on DRA7XX_EVM board (OMAP5). > >> Xen version is 4.4. > >> I run only Linux (kernel 3.8) as Dom0, no other active domains (For clear > >> tests results I decided not to start DomU). > >> iperf server is started on host, iperf client is started on board with > >> command line "iperf -c 192.168.2.10 -w 256k -m > >> -f M -d -t 60". > > > > Just to double check: you are running the iperf test in Dom0, correct? > > Yes, iperf is running in Dom0. > > >> During test I studied CPU load with top tool on Dom0, and saw, that one > >> VCPU is totally loaded, spending about 50% in > >> software IRQs, and 50% in system. > >> Running the same test on clear Linux without Xen, I saw that CPU load is > >> about 2-4%. > >> > >> I decided to debug a bit, so I used "({register uint64_t _r; asm > >> volatile("mrrc " "p15, 0, %0, %H0, c14" ";" : "=r" > >> (_r)); _r; })" command to read timer counter before and after operations I > >> want to test. > >> > >> In such way I've found, that most time of CPU is spent in functions > >> enable_irq/disable_irq_nosync and > >> spin_lock_irqsave/spin_unlock_irqrestore (mostly in "mrs %0, cpsr @ > >> arch_local_irq_save"/"msr cpsr_c, %0 @ > >> local_irq_restore"). When running without Xen it should not take so much > >> time. > > > > There is nothing Xen specific in the Linux ARM implementation of > > spin_lock_irqsave/spin_unlock_irqrestore and > > enable_irq/disable_irq_nosync. > > > > That is strange, because my explorations show a lot of time is spent > there, for example in spin_unlock_irqrestore (mostly in mrs > instuction) about 20%, when running in Dom0. Unless you are doing something wrong in your measurements, if you really narrowed it down to one instruction then I would try to do the same on a different SoC of another vendor to see if it is actually an hardware issue. > >> So, could anyone explain me some questions: > >> 1. Is it normal behaviour? > > > > No, it is not normal. > > Assuming that you assign all the memory to Dom0 and as many vcpu as > > physical cpus on your platform then you should get the same numbers as > > native. > > OK, so I might do something wrong. > > >> 2. Does hypervisor trap cpsr register? I suppose, that hypervisor trap > >> access to cpsr register, that leads to > >> additional overhead, but I can't find place in sources where it happens. > > > > We don't trap cpsr. > > It is strange, because it was only one my assumption, where time can be spent. > So could you please advise where to go to understand the reason of > such high VCPU load? I don't know. When you say that arch_local_irq_save is the one taking all the time, do you actually have something like: time1 = read CNTVCT; arch_local_irq_save(); time2 = read CNTVCT; printk(time2-time1); in your code? > Best regards. > > Iurii Konovalenko | Senior Software Engineer > GlobalLogic > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |