> From: kevin.tian@xxxxxxxxx > To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > CC: george.dunlap@xxxxxxxxxxxxx > Date: Sun, 22 May 2011 10:13:59 +0800 > Subject: RE: [Xen-devel] Too much VCPUS makes domU high CPU utiliazation > > >From: MaoXiaoyun [mailto:tinnycloud@xxxxxxxxxxx] > >Sent: Saturday, May 21, 2011 11:44 AM > > > >Although I still not figure out why VCPU fall either on even or odd PCPUS only , If I explictly set "VCPU=[4~15]" in HVM configuration, VM will use > >all PCPUS from 4 to 15. > > This may implicate that NUMA is enabled on your M1 and thus Xen scheduler tries to use > local memory to avoid remote access latency and that's why your domain A is affined to > a fix set of cpus >
That's it. I've saw NUMA enabled for M1 in BIOS config while M2 is disabled.
> >Also I may find the reason why guest boot so slow. > > > >I think the reason is the Number of Guest VCPU > the Number of physical CPUs that the Guest can run on > >In my test, my physical has 16 PCPUS and dom0 takes 4, so for every Guest, only 12 Physical CPUs are available. > > The scheduler in the hypervisor is designed to multiplex multiple vcpus on a single cpu, > and thus even when dom0 has 4 vcpus it doesn't mean that only the rest 12 pcpus are > available for use. >
Oh, I should have explained that currently we pin dom0 VCPU in first PCPUS(add dom0_max_vcpus=4 dom0_vcpus_pin in grub)
And for Guest, we set "CPU=[4-15]" in HVM, so actually the PCPU for both dom0 and Guest VM is limited. We set this for the purpose
of ensure dom0 get better performance.
> >So, if Guest has 16 VCPUS, and only 12 Physical are available, when heavy load, there will be two or more VCPUS are queued > >on one Physical CPU, and if there exists VCPU is waiting for other other VCPUS respone(such as IPI memssage), the waiting time > >would be much longer. > > > >Especially, during Guest running time, if a process inside Guest takes 16 threads to run, then it is possible each VCPU owns one > >thread, under physical, those VCPUs still queue on PCPUS, if there is some busy waiting code process, such as (spinlock), > >it will make Guest high CPU utilization. If the the busy waiting code is not so frequently, we might see CPU utilization jump to > >very high and drop to low now and then. > > > >Could it be possible? > > It's possible. As I replied in earlier thread, lock contention at boot time may slow
down > the process slightly or heavily. Remember that the purpose of virtualization is to > consolidate multiple VMs on a single platform to maximum resource utilization. Some > use cases can have N:1 (where N can be 8) consolidation ratio, and others may have > smaller ratio. There're many reasons for a given environment to scale up, and you need > capture enough trace information for the bottleneck. Some bottlenecks may be hard > to tackle which will finally form into your business best practice, while some may be > simply improved by proper configuration change. So it's really too early to say whether > your setup is not feasible or not. You need dive into it with more details. :-) >
I agree. I did some extra tests.
In "xm top", there is a colum "VBD_RD", means the number of Read IO, i count the time from VM start to first IO Read.
1) Guest VCPU = 16, set cpu="4-15" in HVM, first IO shows up 85 seconds after VM start.
2) Guest VCPU = 12, set cpu="4-15" in HVM, first IO shows up 20 seconds after VM start.
3) Guest VCPU = 8, set cpu="4-7" in HVM, first IO shows up 90 seconds after VM start.
4) Guest VCPU = 8, set cpu="4-15" in HVM, first IO shows up 20 seconds after VM start.
5) Guest VCPU = 16, set cpu="0-15" in HVM, first IO shows up 23 seconds after VM start.
Previous I mentioned that we give first 4 Physical CPU *only* to dom0, it looks like, for larger VCPU guest, say 16, I
shall not limit its PCPU to 12, but give all available PCPU to them.(just like test 5)
> Thanks > Kevin
|