[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Poor HVM performance with 8 vcpus
> With a specific benchmark producing a rather high load on memory > management > operations (lots of process creation/deletion and memory allocation) the 8 > vcpu performance was worse than the 4 vcpu performance. On other platforms > (/390, MIPS, SPARC) this benchmark scaled rather well with the number of > cpus. > > The result of the usage of the software performance counters of XEN seemed > to point to the shadow lock being the reason. I modified the Hypervisor to > gather some lock statistics (patch will be sent soon) and found that the > shadow lock is really the bottleneck. On average 4 vcpus are waiting to > get the lock! At various points in the shadow pagetable code, xen needs to be able to find all the writeable mappings (PTEs) to a particular page. Rather than storing a data structure to enable the frame number to list of PTEs lookup, we've found that it is generally quicker to use a heuristic. The heuristic knows where to look to find writeable mappings in a number of common OSes. For example, it knows to look in the direct mapped (1:1) kernel address regions in linux, or the recursive linear mapping in windows. If application of the heuristics fails, xen resorts to a brute force search. Unless BS2000 just happens to use the exact same virtual memory layout as any of the other supported OSes, the heuristic will be failing. The brute force search is rather slow, which will result in the shadow lock being held for an extensive period, resulting in lock conveys on SMP guests. The quick fix is to add a heuristic for BS2000. However, the list of heuristics is getting a bit unmanageable, and they're currently dumbly tried in-order. Given the user-base size of BS2000, Keir is likely to insist the heuristic for BS2000 is the last to be tried :) At the very least it would be good to have a predictor which figured out which of the several heuristics should actually be used for a given VM. A simple "try whichever one worked last time first" should work fine. Even smarter would be two just have heuristics for the two general classes of mapping (1:1 and recursive), and have the code automatically figure out the starting virtual address being used for a given guest. All fun stuff. Ian > Is this a known issue? > Is there a chance to split the shadow lock into sub-locks or to use a > reader/writer lock instead? > I just wanted to ask before trying to understand all of the shadow code :- > ) > > > Juergen > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 > Fujitsu Technolgy Solutions e-mail: > juergen.gross@xxxxxxxxxxxxxx > Otto-Hahn-Ring 6 Internet: ts.fujitsu.com > D-81739 Muenchen Company details: > ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |