[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.
On 2/4/2016 7:06 PM, George Dunlap wrote: On Thu, Feb 4, 2016 at 9:38 AM, Yu, Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:On 2/4/2016 5:28 PM, Paul Durrant wrote:I assume this means that the emulator can 'unshadow' GTTs (I guess on an LRU basis) so that it can shadow new ones when the limit has been exhausted? If so, how bad is performance likely to be if we live with a lower limit and take the hit of unshadowing if the guest GTTs become heavily fragmented?Thank you, Paul. Well, I was told the emulator have approaches to delay the shadowing of the GTT till future GPU commands are submitted. By now, I'm not sure about the performance penalties if the limit is set too low. Although we are confident 8K is a secure limit, it seems still too high to be accepted. We will perform more experiments with this new approach to find a balance between the lowest limit and the XenGT performance.Just to check some of my assumptions: I assume that unlike memory accesses, your GPU hardware cannot 'recover' from faults in the GTTs. That is, for memory, you can take a page fault, fix up the pagetables, and then re-execute the original instruction; but so far I haven't heard of any devices being able to seamlessly re-execute a transaction after a fault. Is my understanding correct? Yes If that is the case, then for every top-level value (whatever the equivalent of the CR3), you need to be able to shadow the entire GTT tree below it, yes? You can't use a trick that the memory shadow pagetables can use, of unshadowing parts of the tree and reshadowing them. So as long as the currently-in-use GTT tree contains no more than $LIMIT ranges, you can unshadow and reshadow; this will be slow, but strictly speaking correct. What do you do if the guest driver switches to a GTT such that the entire tree takes up more than $LIMIT entries? Good question. Like the memory virtualization, IIUC, besides wp the guest page tables, we can also track the updates of them when cr3 is written or when a tlb flush occurs. We can consider to optimize our GPU device model to achieve similar goal, e.g. when a root pointer(like cr3) to the page table is written and when a set of commands is submitted(Both situations are trigger by MMIO operations). But taking consideration of performance, we may probably still need to wp all the page tables when they are created at the first time. It requires a lot optimization work in the device model side to find a balance between a minimal wp-ed gpfns and a reasonable performance. We'd like to have a try. :) Yu _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |