 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 127070: regressions - FAIL
 On 03/09/18 17:21, Jan Beulich wrote: >>>> On 03.09.18 at 14:56, <jgross@xxxxxxxx> wrote: >> On 03/09/18 14:44, Jan Beulich wrote: >>>>>> On 01.09.18 at 23:43, <osstest-admin@xxxxxxxxxxxxxx> wrote: >>>> flight 127070 xen-unstable real [real] >>>> http://logs.test-lab.xenproject.org/osstest/logs/127070/ >>>> >>>> Regressions :-( >>>> >>>> Tests which did not succeed and are blocking, >>>> including tests which could not be run: >>>> test-amd64-i386-xl-shadow 20 guest-start/debian.repeat fail REGR. vs. >> 126854 >>> >>> I wonder if this >>> >>> [ 30.017142] BUG: unable to handle kernel paging request at 0002ffa8 >>> [ 30.017208] IP: __radix_tree_lookup+0x12/0xb0 >>> [ 30.017235] *pdpt = 000000001eca5027 *pde = 0000000000000000 >>> [ 30.017271] Oops: 0000 [#1] SMP >>> [ 30.017293] Modules linked in: ext4 mbcache jbd2 >>> [ 30.017352] CPU: 2 PID: 1204 Comm: systemd Not tainted 4.14.67+ #1 >>> [ 30.017383] task: df601f80 task.stack: dafd8000 >>> [ 30.017411] EIP: __radix_tree_lookup+0x12/0xb0 >>> [ 30.017445] EFLAGS: 00010282 CPU: 2 >>> [ 30.017468] EAX: 0002ffa4 EBX: b7ed2000 ECX: 00000000 EDX: 01ffffff >>> [ 30.017503] ESI: 00000000 EDI: 00000000 EBP: dafd9de4 ESP: dafd9dd0 >>> [ 30.017534] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 >>> [ 30.017571] CR0: 80050033 CR2: 0002ffa8 CR3: 1eca4000 CR4: 00042660 >>> [ 30.017620] Call Trace: >>> [ 30.017642] radix_tree_lookup_slot+0x11/0x30 >>> [ 30.017673] ? xen_set_pud+0xa0/0xa0 >>> [ 30.017699] find_get_entry+0x1d/0x110 >>> [ 30.017723] pagecache_get_page+0x1f/0x230 >>> [ 30.017752] lookup_swap_cache+0x35/0x110 >>> [ 30.017778] swap_readahead_detect+0x84/0x2f0 >>> [ 30.017809] do_swap_page+0x25b/0x8e0 >>> [ 30.017837] ? wp_page_copy+0x399/0x6b0 >>> [ 30.017866] ? kmap_atomic_prot+0x2b/0x180 >>> [ 30.017892] ? __raw_callee_save_xen_pte_val+0xc/0xc >>> [ 30.017925] handle_mm_fault+0x468/0x9e0 >>> [ 30.017951] __do_page_fault+0x1ba/0x4e0 >>> [ 30.017976] ? __do_page_fault+0x4e0/0x4e0 >>> [ 30.018008] do_page_fault+0x37/0x100 >>> [ 30.018032] ? __do_page_fault+0x4e0/0x4e0 >>> [ 30.018060] common_exception+0x77/0x7e >>> [ 30.018084] EIP: 0xb7f0d39f >>> [ 30.018101] EFLAGS: 00010246 CPU: 2 >>> [ 30.018124] EAX: b7ed2030 EBX: b7f20000 ECX: b7bf91b8 EDX: 00000002 >>> [ 30.018158] ESI: b7f2055c EDI: b7f10e90 EBP: b7bf9260 ESP: b7bf9208 >>> [ 30.018190] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b >>> [ 30.018220] ? __do_page_fault+0x4e0/0x4e0 >>> [ 30.018242] Code: 00 8b 03 c1 e8 1a 85 c0 74 be 0f 0b 8d b6 00 00 00 00 >> 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 89 45 f0 89 4d ec 8b 45 f0 >> <8b> >> 40 04 89 c1 83 e1 03 83 f9 01 75 71 89 c1 bf 40 00 00 00 83 >>> [ 30.018415] EIP: __radix_tree_lookup+0x12/0xb0 SS:ESP: 0069:dafd9dd0 >>> [ 30.018445] CR2: 000000000002ffa8 >>> [ 30.018472] ---[ end trace c8ba97a241bb2040 ]--- >>> >>> isn't a (presumably indirect) result of >>> >>> Sep 1 03:06:32.180094 (XEN) d28 L1TF-vulnerable L1e 8000000400000000 - >> Shadowing >>> >>> Jürgen's change to avoid split PTE writes would then only be >>> papering over an active issue. >> >> No, it isn't papering over the issue, but repairing it. See >> >> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >> >> which clearly states that this issue has been seen on bare metal, too. >> On Xen its just much more frequent as the timing is different. > > Hmm, yes - if the problem exists also on native, then while your fix is > hiding that problem, it's not one in Xen code. Question though is how > valuable this particular test is until the fix has trickled in on the Linux > side. I'm about to request the patch to be included in stable kernels. To be more explicit why the patch is fixing the problem: native_ptep_get_and_clear() is thought to get the old pte contents and clear the pte atomically. While the implementation via 32-bit operations was fine regarding races against other updates it was not fine regarding read accesses after having written the low word and before clearing the high word: a page fault would detect a non-zero value in the high word and assume the data being present on some swap device. My patch now removes this possible race and the problem can't occur any longer. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |