[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
On 4/19/2016 12:37 PM, Tian, Kevin wrote: From: Yu, Zhang [mailto:yu.c.zhang@xxxxxxxxxxxxxxx] Sent: Thursday, April 14, 2016 6:45 PM On 4/11/2016 7:15 PM, Yu, Zhang wrote:On 4/8/2016 7:01 PM, George Dunlap wrote:On 08/04/16 11:10, Yu, Zhang wrote: [snip]BTW, I noticed your reply has not be CCed to mailing list, and I also wonder if we should raise this last question in community?Oops -- that was a mistake on my part. :-) I appreciate the discretion; just so you know in the future, if I'm purposely changing the CC list (removing xen-devel and/or adding extra people), I'll almost always say so at the top of the mail.And then of course there's the p2m_ioreq_server -> p2m_ram_logdirty transition -- I assume that live migration is incompatible with this functionality? Is there anything that prevents a live migration from being started when there are outstanding p2m_ioreq_server entries?Another good question, and the answer is unfortunately yes. :-) If live migration happens during the normal emulation process, entries marked with p2m_ioreq_server will be changed to p2m_log_dirty in resolve_misconfig(), and later write operations will change them to p2m_ram_rw, thereafter these pages can not be forwarded to device model. From this point of view, this functionality is incompatible with live migration. But for XenGT, I think this is acceptable, because, if live migration is to be supported in the future, intervention from backend device model will be necessary. At that time, we can guarantee from the device model side that there's no outdated p2m_ioreq_server entries, hence no need to reset the p2m type back to p2m_ram_rw(and do not include p2m_ioreq_server in the P2M_CHANGEABLE_TYPES). By "outdated", I mean when an ioreq server is detached from p2m_ioreq_server, or before an ioreq server is attached to this type, entries marked with p2m_ioreq_server should be regarded as outdated. Is this acceptible to you? Any suggestions?So the question is, as of this series, what happens if someone tries to initiate a live migration while there are outstanding p2m_ioreq_server entries? If the answer is "the ioreq server suddenly loses all control of the memory", that's something that needs to be changed.Sorry, for this patch series, I'm afraid the above description is the answer. Besides, I find it's hard to change current code to both support the deferred resetting of p2m_ioreq_server and the live migration at the same time. One reason is that a page with p2m_ioreq_server behaves differently in different situations. My assumption of XenGT is that, for live migration to work, the device model should guarantee there's no outstanding p2m_ioreq_server pages in hypervisor(no need to use the deferred recalculation), and it is our device model who should be responsible for the copying of the write protected guest pages later. And another solution I can think of: when unmapping the ioreq server, we walk the p2m table and reset entries with p2m_ioreq_server back directly, instead of deferring the reset. And of course, this means performance impact. But since the mapping and unmapping of an ioreq server is not a frequent one, the performance penalty may be acceptable. How do you think about this approach?George, sorry to bother you. Any comments on above option? :) Another choice might be to let live migration fail if there's outstanding p2m_ioreq_server entries. But I'm not quite inclined to do so, because: 1> I'd still like to keep live migration feature for XenGT. 2> Not easy to know if there's outstanding p2m_ioreq_server entries. I mean, since p2m type change is not only triggered by hypercall, to keep a counter for remaining p2m_ioreq_server entries means a lot code changes; Besides, I wonder whether the requirement to reset the p2m_ioreq_server is indispensable, could we let the device model side to be responsible for this? The worst case I can imagine for device model failing to do so is that operations of a gfn might be delivered to a wrong device model. I'm not clear what kind of damage would this cause to the hypervisor or other VM. Does any other maintainers have any suggestions? Thanks in advance! :)I'm not sure how above is working. In pre-copy phase (where logdirty is concerned), the device model is still actively serving requests from guest, including initiating new write-protection requests. How can you guarantee draining of outstanding p2m_ioreq_server entries w/o actually freezing device model (while freezing device model means guest driver might be blocked with random errors)? You are right, and I'm not suggesting to clear the p2m_ioreq_server entries when live migration happens. My suggestion is that either we guarantee there is no outstanding p2m_ioreq_server entries right after the ioreq server is unbounded, or do not support live migration for now. :) B.R. Yu _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |