Xen project Mailing List

Re: [Xen-devel] long latency of domain shutdown

To: "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>

From: "Jan Beulich" <jbeulich@xxxxxxxxxx>

Date: Thu, 08 May 2008 10:58:25 +0100

Delivery-date: Thu, 08 May 2008 02:58:14 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 30.04.08 16:26 >>> >On 30/4/08 15:00, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote: >> >> According to two forced backtraces with about a second delta, the >> hypervisor is in the process of releasing the 1:1 mapping of the >> guest kernel and managed, during that one second, to increment >> i in free_l3_table() by just 1. This would make up for unbelievable >> 13,600 clocks per l1 entry being freed. > >That's not great. :-) At such a high cost, perhaps some tracing might >indicate if we are taking some stupid slow path in free_domheap_page() or >cleanup_page_cacheattr()? I very much hope that 13600 cycles cannot be >legitimately accounted for! I'm afraid it's really that bad. I used another (local to my office) machine, and the numbers aren't exactly as bad as on the box they were originally measured on, but after getting the cumulative clock cycles spent in free_l1_table() and free_domheap_pages() (and their descendants, so the former obviously includes a large part of the latter) during the largest single run of relinquish_memory() I'm getting an average of 3,400 clocks spent in free_domheap_pages() (with all but very few pages going onto the scrub list) and 8,500 clocks spent per page table entry (assuming all entries are populated, so the number really is higher) in free_l1_table(). It's the relationship between the two numbers that makes me believe that there's really this much time spent on it. For the specific case of cleaning up after a domain, there seems to be a pretty simple workaround, though: free_l{3,4}_table() can simply avoid recursing into put_page_from_l{3,4}e() by checking d->arch.relmem being RELMEM_dom_l{3,4}. This, as expected, reduces the latency of preempting relinquish_memory() (for a 5G domU) on the box I tested from about 3s to less than half a second - if that's considered still too much, the same kind of check could of course be added to free_l2_table(). But as there's no similarly simple mechanism to deal with the DoS potential in pinning/unpinning or installing L4 (and maybe L3) table entries, there'll need to be a way to preempt these call trees anyway. Since hypercalls cannot nest, storing respective state in the vcpu structure shouldn't be a problem, but what I'm unsure about is what side effects a partially validated page table might introduce. While looking at this I wondered whether there really is a way for Xen heap pages to end up being guest page tables (or similarly descriptor table ones)? I would think if that happened this would be a bug (and perhaps a security issue). If it cannot happen, then the RELMEM_* states could be simplified and domain_relinquish_resources() shortened. (I was traveling, so it took a while to get to do the measurements.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.