[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

To: Wanpeng Li <kernellwp@xxxxxxxxx>
From: Quan Xu <quan.xu0@xxxxxxxxx>
Date: Tue, 14 Nov 2017 18:23:30 +0800
Cc: Juergen Gross <jgross@xxxxxxxx>, Yang Zhang <yang.zhang.wz@xxxxxxxxx>, Rusty Russell <rusty@xxxxxxxxxxxxxxx>, kvm <kvm@xxxxxxxxxxxxxxx>, linux-doc@xxxxxxxxxxxxxxx, the arch/x86 maintainers <x86@xxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, Ingo Molnar <mingo@xxxxxxxxxx>, Quan Xu <quan.xu03@xxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, "open list:FILESYSTEMS \(VFS and infrastructure\)" <linux-fsdevel@xxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Alok Kataria <akataria@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Delivery-date: Tue, 14 Nov 2017 10:23:46 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>



On 2017/11/14 16:22, Wanpeng Li wrote:

2017-11-14 16:15 GMT+08:00 Quan Xu <quan.xu0@xxxxxxxxx>:


On 2017/11/14 15:12, Wanpeng Li wrote:

2017-11-14 15:02 GMT+08:00 Quan Xu <quan.xu0@xxxxxxxxx>:


On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu <quan.xu0@xxxxxxxxx>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang.wz@xxxxxxxxx>
Signed-off-by: Quan Xu <quan.xu0@xxxxxxxxx>
Cc: Juergen Gross <jgross@xxxxxxxx>
Cc: Alok Kataria <akataria@xxxxxxxxxx>
Cc: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
      29031.6 bit/s -- 76.1 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
      35787.7 bit/s -- 129.4 %CPU

   3. w/ kvm dynamic poll:
      35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

We really appreciate upstream's kvm dynamic poll mechanism, which is
really helpful for a lot of scenario..

However, as description said, in virtualization, idle path includes
several heavy operations includes timer access (LAPIC timer or TSC
deadline timer) which will hurt performance especially for latency
intensive workload like message passing task. The cost is mainly from
the vmexit which is a hardware context switch between virtual machine
and hypervisor.

for upstream's kvm dynamic poll mechanism, even you could provide a
better algorism, how could you bypass timer access (LAPIC timer or TSC
deadline timer), or a hardware context switch between virtual machine
and hypervisor. I know these is a tradeoff.

Furthermore, here is the data we get when running benchmark contextswitch
to measure the latency(lower is better):

1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
   3402.9 ns/ctxsw -- 199.8 %CPU

2. w/ patch and disable kvm dynamic poll:
   1163.5 ns/ctxsw -- 205.5 %CPU

3. w/ kvm dynamic poll:
   2280.6 ns/ctxsw -- 199.5 %CPU

so, these tow solution are quite similar, but not duplicate..

that's also why to add a generic idle poll before enter real idle path.
When a reschedule event is pending, we can bypass the real idle path.

There is a similar logic in the idle governor/driver, so how this
patchset influence the decision in the idle governor/driver when
running on bare-metal(power managment is not exposed to the guest so
we will not enter into idle driver in the guest)?


This is expected to take effect only when running as a virtual machine with
proper CONFIG_* enabled. This can not work on bare mental even with proper
CONFIG_* enabled.

Quan
Alibaba Cloud

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH RFC v3 0/6] x86/idle: add halt poll support
  - From: Quan Xu
- [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
  - From: Quan Xu
- Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
  - From: Juergen Gross
- Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
  - From: Quan Xu
- Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
  - From: Wanpeng Li
- Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
  - From: Quan Xu
- Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
  - From: Wanpeng Li

Prev by Date: [Xen-devel] [seabios test] 116148: regressions - FAIL
Next by Date: Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
Previous by thread: Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
Next by thread: Re: [Xen-devel] [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.