[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 127070: regressions - FAIL


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Mon, 3 Sep 2018 17:28:52 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNHkp1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmRlPsLAeQQTAQIAIwUCU4xw6wIbAwcL CQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJELDendYovxMvi4UH/Ri+OXlObzqMANruTd4N zmVBAZgx1VW6jLc8JZjQuJPSsd/a+bNr3BZeLV6lu4Pf1Yl2Log129EX1KWYiFFvPbIiq5M5 kOXTO8Eas4CaScCvAZ9jCMQCgK3pFqYgirwTgfwnPtxFxO/F3ZcS8jovza5khkSKL9JGq8Nk czDTruQ/oy0WUHdUr9uwEfiD9yPFOGqp4S6cISuzBMvaAiC5YGdUGXuPZKXLpnGSjkZswUzY d9BVSitRL5ldsQCg6GhDoEAeIhUC4SQnT9SOWkoDOSFRXZ+7+WIBGLiWMd+yKDdRG5RyP/8f 3tgGiB6cyuYfPDRGsELGjUaTUq3H2xZgIPfOwE0EU4xwFgEIAMsx+gDjgzAY4H1hPVXgoLK8 B93sTQFN9oC6tsb46VpxyLPfJ3T1A6Z6MVkLoCejKTJ3K9MUsBZhxIJ0hIyvzwI6aYJsnOew cCiCN7FeKJ/oA1RSUemPGUcIJwQuZlTOiY0OcQ5PFkV5YxMUX1F/aTYXROXgTmSaw0aC1Jpo w7Ss1mg4SIP/tR88/d1+HwkJDVW1RSxC1PWzGizwRv8eauImGdpNnseneO2BNWRXTJumAWDD pYxpGSsGHXuZXTPZqOOZpsHtInFyi5KRHSFyk2Xigzvh3b9WqhbgHHHE4PUVw0I5sIQt8hJq 5nH5dPqz4ITtCL9zjiJsExHuHKN3NZsAEQEAAcLAXwQYAQIACQUCU4xwFgIbDAAKCRCw3p3W KL8TL0P4B/9YWver5uD/y/m0KScK2f3Z3mXJhME23vGBbMNlfwbr+meDMrJZ950CuWWnQ+d+ Ahe0w1X7e3wuLVODzjcReQ/v7b4JD3wwHxe+88tgB9byc0NXzlPJWBaWV01yB2/uefVKryAf AHYEd0gCRhx7eESgNBe3+YqWAQawunMlycsqKa09dBDL1PFRosF708ic9346GLHRc6Vj5SRA UTHnQqLetIOXZm3a2eQ1gpQK9MmruO86Vo93p39bS1mqnLLspVrL4rhoyhsOyh0Hd28QCzpJ wKeHTd0MAWAirmewHXWPco8p1Wg+V+5xfZzuQY0f4tQxvOpXpt4gQ1817GQ5/Ed/wsDtBBgB CAAgFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrd8NACGwIAgQkQsN6d1ii/Ey92IAQZFggA HRYhBFMtsHpB9jjzHji4HoBcYbtP2GO+BQJa3fDQAAoJEIBcYbtP2GO+TYsA/30H/0V6cr/W V+J/FCayg6uNtm3MJLo4rE+o4sdpjjsGAQCooqffpgA+luTT13YZNV62hAnCLKXH9n3+ZAgJ RtAyDWk1B/0SMDVs1wxufMkKC3Q/1D3BYIvBlrTVKdBYXPxngcRoqV2J77lscEvkLNUGsu/z W2pf7+P3mWWlrPMJdlbax00vevyBeqtqNKjHstHatgMZ2W0CFC4hJ3YEetuRBURYPiGzuJXU pAd7a7BdsqWC4o+GTm5tnGrCyD+4gfDSpkOT53S/GNO07YkPkm/8J4OBoFfgSaCnQ1izwgJQ jIpcG2fPCI2/hxf2oqXPYbKr1v4Z1wthmoyUgGN0LPTIm+B5vdY82wI5qe9uN6UOGyTH2B3p hRQUWqCwu2sqkI3LLbTdrnyDZaixT2T0f4tyF5Lfs+Ha8xVMhIyzNb1byDI5FKCb
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, osstest service owner <osstest-admin@xxxxxxxxxxxxxx>
  • Delivery-date: Mon, 03 Sep 2018 15:29:19 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 03/09/18 17:21, Jan Beulich wrote:
>>>> On 03.09.18 at 14:56, <jgross@xxxxxxxx> wrote:
>> On 03/09/18 14:44, Jan Beulich wrote:
>>>>>> On 01.09.18 at 23:43, <osstest-admin@xxxxxxxxxxxxxx> wrote:
>>>> flight 127070 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/127070/ 
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-amd64-i386-xl-shadow   20 guest-start/debian.repeat fail REGR. vs. 
>> 126854
>>>
>>> I wonder if this
>>>
>>> [   30.017142] BUG: unable to handle kernel paging request at 0002ffa8
>>> [   30.017208] IP: __radix_tree_lookup+0x12/0xb0
>>> [   30.017235] *pdpt = 000000001eca5027 *pde = 0000000000000000 
>>> [   30.017271] Oops: 0000 [#1] SMP
>>> [   30.017293] Modules linked in: ext4 mbcache jbd2
>>> [   30.017352] CPU: 2 PID: 1204 Comm: systemd Not tainted 4.14.67+ #1
>>> [   30.017383] task: df601f80 task.stack: dafd8000
>>> [   30.017411] EIP: __radix_tree_lookup+0x12/0xb0
>>> [   30.017445] EFLAGS: 00010282 CPU: 2
>>> [   30.017468] EAX: 0002ffa4 EBX: b7ed2000 ECX: 00000000 EDX: 01ffffff
>>> [   30.017503] ESI: 00000000 EDI: 00000000 EBP: dafd9de4 ESP: dafd9dd0
>>> [   30.017534]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>>> [   30.017571] CR0: 80050033 CR2: 0002ffa8 CR3: 1eca4000 CR4: 00042660
>>> [   30.017620] Call Trace:
>>> [   30.017642]  radix_tree_lookup_slot+0x11/0x30
>>> [   30.017673]  ? xen_set_pud+0xa0/0xa0
>>> [   30.017699]  find_get_entry+0x1d/0x110
>>> [   30.017723]  pagecache_get_page+0x1f/0x230
>>> [   30.017752]  lookup_swap_cache+0x35/0x110
>>> [   30.017778]  swap_readahead_detect+0x84/0x2f0
>>> [   30.017809]  do_swap_page+0x25b/0x8e0
>>> [   30.017837]  ? wp_page_copy+0x399/0x6b0
>>> [   30.017866]  ? kmap_atomic_prot+0x2b/0x180
>>> [   30.017892]  ? __raw_callee_save_xen_pte_val+0xc/0xc
>>> [   30.017925]  handle_mm_fault+0x468/0x9e0
>>> [   30.017951]  __do_page_fault+0x1ba/0x4e0
>>> [   30.017976]  ? __do_page_fault+0x4e0/0x4e0
>>> [   30.018008]  do_page_fault+0x37/0x100
>>> [   30.018032]  ? __do_page_fault+0x4e0/0x4e0
>>> [   30.018060]  common_exception+0x77/0x7e
>>> [   30.018084] EIP: 0xb7f0d39f
>>> [   30.018101] EFLAGS: 00010246 CPU: 2
>>> [   30.018124] EAX: b7ed2030 EBX: b7f20000 ECX: b7bf91b8 EDX: 00000002
>>> [   30.018158] ESI: b7f2055c EDI: b7f10e90 EBP: b7bf9260 ESP: b7bf9208
>>> [   30.018190]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>>> [   30.018220]  ? __do_page_fault+0x4e0/0x4e0
>>> [   30.018242] Code: 00 8b 03 c1 e8 1a 85 c0 74 be 0f 0b 8d b6 00 00 00 00 
>> 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 89 45 f0 89 4d ec 8b 45 f0 
>> <8b> 
>> 40 04 89 c1 83 e1 03 83 f9 01 75 71 89 c1 bf 40 00 00 00 83
>>> [   30.018415] EIP: __radix_tree_lookup+0x12/0xb0 SS:ESP: 0069:dafd9dd0
>>> [   30.018445] CR2: 000000000002ffa8
>>> [   30.018472] ---[ end trace c8ba97a241bb2040 ]---
>>>
>>> isn't a (presumably indirect) result of
>>>
>>> Sep  1 03:06:32.180094 (XEN) d28 L1TF-vulnerable L1e 8000000400000000 - 
>> Shadowing
>>>
>>> Jürgen's change to avoid split PTE writes would then only be
>>> papering over an active issue.
>>
>> No, it isn't papering over the issue, but repairing it. See
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>
>> which clearly states that this issue has been seen on bare metal, too.
>> On Xen its just much more frequent as the timing is different.
> 
> Hmm, yes - if the problem exists also on native, then while your fix is
> hiding that problem, it's not one in Xen code. Question though is how
> valuable this particular test is until the fix has trickled in on the Linux
> side.

I'm about to request the patch to be included in stable kernels.

To be more explicit why the patch is fixing the problem:

native_ptep_get_and_clear() is thought to get the old pte contents and
clear the pte atomically. While the implementation via 32-bit
operations was fine regarding races against other updates it was not
fine regarding read accesses after having written the low word and
before clearing the high word: a page fault would detect a non-zero
value in the high word and assume the data being present on some swap
device.

My patch now removes this possible race and the problem can't occur any
longer.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.