Xen project Mailing List

[Xen-devel] [PATCH]RFC: VGA accleration using shadow PTE D-bit to construct LFB dirty bitmap

To: <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Huang, Xinmei" <xinmei.huang@xxxxxxxxx>

Date: Sat, 28 Jul 2007 15:28:01 +0800

Cc: Ian.Pratt@xxxxxxxxxxxx

Delivery-date: Mon, 30 Jul 2007 09:58:22 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcfQMvi4wDtolCuKTuuzkKKQL9EH1Q==

Thread-topic: [PATCH]RFC: VGA accleration using shadow PTE D-bit to construct LFB dirty bitmap

With current accelerated VGA for qemu-dm, guest can access LFB directly, however, qemu-dm is not conscious of these accesses to LFB. The accompanying task is to determine the range of LFB to be redrawn on guest display window. Current qemu-dm maintains a copy of LFB, and gets the LFB dirty-bitmap through memcmp. This patch adopts another way to get the LFB dirty-bitmap: one hypercall to instruct hypervisor to fill the dirty-bitmap. Hypervisor checks the D-bit of PTEs and updates the dirty-bitmap.

Theoretically, the overhead of memcmp-based method is dependent on graphic-intensive workload, more exactly, the probability distribution of address of LFB writes, whereas the overhead of hypercall-based method is relatively stable. Hypercall and LFB L1 pagetable walking contribute the overhead of the later one.

Normal shadow pagetable would be re-claimed, i.e. L1 shadow for LFB would disappears, resulting some issues. One appoach is to keep all the shadow pagetables for LFB pinned. It is complicated, as the top level pagetable is pinnable in current shadow(I'd tried but failed). I'm not sure this shadow for LFB pinning approach would bring sufficient performance benefits. This patch just gives an all-dirty-bitmap when L1 shadow for LFB alters. This appoach avoids complicated tracking mechanism for shadow at the cost of some unnecessary re-drawing for qemu-dm. Ideal solution might be the optimum point.

I did some tests to show the benefit of this patch : DB + n Idle winxp

linux/DB guest : running sysbench/DB, 2 vcpus, 512M

winxp guest : 2vcpus, 128M(8M shadow)

The test result show that this patch will bring benefit to our bottleneck dom0 and the system scalability besides qemu-dm itself.

The DB throughtput :

1.w/o patch -- 49% downgrade for 8 winxp guest and 34% downgrade for 4 winxp guest

2.w/ patch -- <2% downgrade for 8 winxp guest and <1% downgrade for 4 winxp guest

Following two charts (refer to the attached file: result.bmp)show the cpu utilization scatters of dom0 w/ and w/o patch

- Xinmei

Attachment: vga.acc.patch
Description: vga.acc.patch

Attachment: result.bmp
Description: result.bmp

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel