[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PATastic fun
On 25.02.2013 10:10, Stefan Bader wrote: > On 25.02.2013 04:15, Liu, Jinsong wrote: >> Konrad Rzeszutek Wilk wrote: >>> On Fri, Feb 22, 2013 at 02:54:16PM +0100, Stefan Bader wrote: >>>> Hi Konrad, >>> >>> Hey Stefan, >>>> >>>> here is another one from the hm-what? department: >>> >>> Heh - the really good-bug-hunting one. Lets also include Jinsong as >>> he has been tracking a similar one with mcelog. >>>> >>>> Colin discovered that running the attached program with the fork >>>> active (e.g. "./mmap-example -f 0x10000", the address can be that or >>>> iomem) this triggers the following weird messages: >>>> >>>> [ 6824.453724] mmap-example:3481 map pfn expected mapping type >>>> write-back for [mem 0x00010000-0x00010fff], got uncached-minus >>>> [ 6824.453776] ------------[ cut here ]------------ >>>> [ 6824.453796] WARNING: at >>>> /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 >>>> untrack_pfn+0xb8/0xd0() ... [ 6824.453920] Pid: 3481, comm: >>>> mmap-example Tainted: GF >>>> 3.8.0-6-generic #13-Ubuntu >>>> [ 6824.453926] Call Trace: >>>> [ 6824.453944] [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0 >>>> [ 6824.453954] [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20 >>>> [ 6824.453963] [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0 >>>> [ 6824.453975] [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100 >>>> [ 6824.453985] [<ffffffff81157459>] unmap_vmas+0x49/0x90 >>>> [ 6824.453995] [<ffffffff8115f808>] exit_mmap+0x98/0x170 >>>> [ 6824.454007] [<ffffffff810559a4>] mmput+0x64/0x100 >>>> [ 6824.454017] [<ffffffff810560f5>] dup_mm+0x445/0x660 >>>> [ 6824.454027] [<ffffffff81056d9f>] >>>> copy_process.part.22+0xa5f/0x1510 [ 6824.454038] >>>> [<ffffffff81057931>] do_fork+0x91/0x350 [ 6824.454048] >>>> [<ffffffff81057c76>] sys_clone+0x16/0x20 [ 6824.454060] >>>> [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [ 6824.454069] >>>> [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f [ 6824.454076] >>>> ---[ end trace 4918cdd0a4c9fea4 ]--- >>>> >>>> I found that this is related to your bandaid patch >>>> >>>> commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1 >>>> Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> >>>> Date: Fri Feb 10 09:16:27 2012 -0500 >>>> >>>> xen/pat: Disable PAT support for now. >>>> >>>> I just do not understand how this happens. From the trace it seems >>>> the fork >>>> fails when duplicating the VMAs (dup_mm calls mmput on failure). So >>>> maybe the >>>> warning is just related to this. So primarily the question is how on >>>> fork the _PAGE_PCD bit can become set? That and _PAGE_PWT are >>>> cleared from the supported >>>> mask by the patch, so somehow I would think nothing should be able >>>> to set it... >>>> But apparently not so. >>>> Not sure it is a big deal since I never saw this in normal operation >>>> and it >>>> seems to be ok when unapping before doing the fork. It is just plain >>>> odd. >>> >>> Jinsong mentioned that there is some oddity with the MTRR. Somehow the >>> ranges are swapped or not correct. Jinsong, could you shed some light >>> on what you have found so far? >>> >> >> Yes, Sander once also reported a similar weird warning when start mcelog >> daemon, as attached. >> >> Basically, it occurs when mcelog user daemon start, >> do_fork >> --> copy_process >> --> dup_mm >> --> dup_mmap >> --> copy_page_range >> --> track_pfn_copy >> --> reserve_pfn_range So that makes it clearer as this will do reserve_memtype(...) --> pat_x_mtrr_type --> mtrr_type_lookup --> __mtrr_type_lookup And that can return -1/0xff in case of mtrr not being enabled/initialized. Which is not the case (given there are no messages for it in dmesg). This is not equal to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS. It looks like the problem starts early in reserve_memtype: if (!pat_enabled) { /* This is identical to page table setting without PAT */ if (new_type) { if (req_type == _PAGE_CACHE_WC) *new_type = _PAGE_CACHE_UC_MINUS; else *new_type = req_type & _PAGE_CACHE_MASK; } return 0; } This would be what we want, but only clearing the PWT and PCD flags from the supported flags is not changing pat_enabled (which is 1 when PAT support is compiled into the kernel). Unfortunately the variable is local and since there are not any messages about PAT in dmesg I would say pat_init() is not called either. Which might be used to disable PAT support by clearing the CPU feature flag. Right now it seems the only work-around that message appearing is to user "nopat" on the kernel command line. -Stefan >> --> line 624: flags != want_flags >> It comes from different memory types of page table (_PAGE_CACHE_WB) and mtrr >> (_PAGE_CACHE_UC_MINUS). >> >> However, why it get different memory types from page table and mtrr is still >> unclear, reproducing the bug is difficult and unstable. >> >> Thanks, > > Ok, so this seems to take the same code paths. As for the test program, it > fails > on duplicating some mmap on a fork. The test program does this all the time > (except the backtrace warning which is warn_once). > So you say, the UC- comes from the MTRR side... Hm, have to look at that. > Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |