[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 1/2] x86/mm: fix a potential race condition in map_pages_to_xen().
On 11/13/2017 5:31 PM, Jan Beulich wrote: On 10.11.17 at 15:05, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:On 11/10/2017 5:49 PM, Jan Beulich wrote:I'm not certain this is important enough a fix to consider for 4.10, and you seem to think it's good enough if this gets applied only after the tree would be branched, as you didn't Cc Julien. Please indicate if you actually simply weren't aware, and you indeed there's an important aspect to this that I'm overlooking.Well, at first I have not expected this to be accepted for 4.10. But since we have met this issue in practice, when running a graphic application which consumes memory intensively in dom0, I think it also makes sense if we can fix it in xen's release as early as possible. Do you think this is a reasonable requirement? :-)You'd need to provide further details for us to understand the scenario. It obviously depends on whether you have other patches to Xen which actually trigger this. If the problem can be triggered from outside of a vanilla upstream Xen, then yes, I think I would favor the fixes being included. Thank, Jan. Let me try to give an explaination of the scenario. :-) We saw an ASSERT failue in ASSERT((page->count_info & PGC_count_mask) != 0) in is_iomem_page() <- put_page_from_l1e() <- alloc_l1_table(), when we run agraphic application(which is a memory eater, but close sourced) in dom0. And this failure only happens when dom0 is configured with 2 vCPUs.Our debug showed the concerned page->count_info was already(and unexpectedly) cleared in free_xenheap_pages(), and the call trace should be like this: free_xenheap_pages() ^ | free_xen_pagetable() ^ | map_pages_to_xen() ^ | update_xen_mappings() ^ | get_page_from_l1e() ^ | mod_l1_entry() ^ | do_mmu_update()And we then realized that it happened when dom0 tries to update its page table, and when the cache attributes are gonna be changed for referenced page frame, corresponding mappings for xen VA space will be updated by map_pages_to_xen() as well.However, since routine map_pages_to_xen() has the aforementioned racing problem, when MMU_NORMAL_PT_UPDATE is triggered concurrently on different CPUs, itmay mistakenly free a superpage referenced by pl2e. That's why our ASSERT failure only happens when dom0 has more than one vCPU configured.As to the code base, we were running XenGT code, which has only a few non-upstreamed patches in Xen - I believe most of them are libxl related ones, and none of them is mmu related. So I believe this issue could be triggered by a pv guest to a vanilla upstream xen. Is above description convincing enough? :-) Yu Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |