[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] mm/page_alloc: make bootscrub happen in idle-loop


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>
  • From: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
  • Date: Thu, 8 Nov 2018 09:05:28 +0000
  • Autocrypt: addr=sergey.dyasli@xxxxxxxxxx; keydata= xsFNBFtMVHEBEADc/hZcLexrB6vGTdGqEUsYZkFGQh6Z1OO7bCtM1go1RugSMeq9tkFHQSOc 9c7W9NVQqLgn8eefikIHxgic6tGgKoIQKcPuSsnqGao2YabsTSSoeatvmO5HkR0xGaUd+M6j iqv3cD7/WL602NhphT4ucKXCz93w0TeoJ3gleLuILxmzg1gDhKtMdkZv6TngWpKgIMRfoyHQ jsVzPbTTjJl/a9Cw99vuhFuEJfzbLA80hCwhoPM+ZQGFDcG4c25GQGQFFatpbQUhNirWW5b1 r2yVOziSJsvfTLnyzEizCvU+r/Ek2Kh0eAsRFr35m2X+X3CfxKrZcePxzAf273p4nc3YIK9h cwa4ZpDksun0E2l0pIxg/pPBXTNbH+OX1I+BfWDZWlPiPxgkiKdgYPS2qv53dJ+k9x6HkuCy i61IcjXRtVgL5nPGakyOFQ+07S4HIJlw98a6NrptWOFkxDt38x87mSM7aSWp1kjyGqQTGoKB VEx5BdRS5gFdYGCQFc8KVGEWPPGdeYx9Pj2wTaweKV0qZT69lmf/P5149Pc81SRhuc0hUX9K DnYBa1iSHaDjifMsNXKzj8Y8zVm+J6DZo/D10IUxMuExvbPa/8nsertWxoDSbWcF1cyvZp9X tUEukuPoTKO4Vzg7xVNj9pbK9GPxSYcafJUgDeKEIlkn3iVIPwARAQABzShTZXJnZXkgRHlh c2xpIDxzZXJnZXkuZHlhc2xpQGNpdHJpeC5jb20+wsGOBBMBCgA4FiEEkI7HMI5EbM2FLA1L Aa+w5JvbyusFAltMVHECGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQAa+w5JvbyuuQ JBAAry/oRK6m0I+ck1Tarz9a1RrF73r1YoJUk5Bw+PSxsBJOPp3vDeAz3Kqw58qmBXeNlMU4 1cqAxFxCCKMtER1gpmrKWBA1/H1ZoBRtzhaHgPTQLyR7LB1OgdpgwEOjN1Q5gME8Pk21y/3N cG5YBgD/ZHbq8nWS/G3r001Ie3nX55uacGk/Ry175cS48+asrerShKMDNMT1cwimo9zH/3Lm RTpWloh2dG4jjwtCXqB7s+FEE5wQVCpPp9p55+9pPd+3DXmsQEcJ/28XHo/UJW663WjRlRc4 wgPwiC9Co1HqaMKSzdPpZmI5D4HizWH8jF7ppUjWoPapwk4dEA7Al0vx1Bz3gbJAL8DaRgQp H4j/16ifletfGUNbHJR2vWljZ5SEf2vMVcdubf9eFUfBF/9OOR1Kcj1PISP8sPhcP7oCfFtH RcxXh1OStrRFtltJt2VlloKXAUggdewwyyD4xl9UHCfI4lSexOK37wNSQYPQcVcOS1bl4NhQ em6pw2AC32NsnQE5PmczFADDIpWhO/+WtkTFeE2HHfAn++y3YDtKQd7xes9UJjQNiGziArST l6Zrx4/nShVLeYRVW76l27gI5a8BZLWwBVRsWniGM50OOJULvSag7kh+cjsrXXpNuA4rfEoB Bxr7pso9e5YghupDc8XftsYd7mlAgOTCAC8uZmfOwU0EW0xUcQEQAMKi97v3DwwPgYVPYIbQ JAvoMgubJllC9RcE0PQsE6nEKSrfOT6Gh5/LHOXLbQI9nzU/xdr6kMfwbYVTnZIY/SwsLrJa gSKm64t11MjC1Vf03/sncx1tgI7nwqMMIAYLsXnQ9X/Up5L/gLO2YDIPxrQ6g4glgRYPT53i r6/hTz3dlpqyPCorpuF+WY7P2ujhlFlXCAaD6btPPM/9LZSmI0xS4aCBLH+pZeCr0UGSMhsX JYN0QRLjfsIDGyqaXVH9gwV2Hgsq6z8fNPQlBc3IpDvfXa1rYtgldYBfG521L3wnsMcKoFSr R5dpH7Jtvv5YBuAk8r571qlMhyAmVKiEnc+RonWl503D5bAHqNmFNjV248J5scyRD/+BcYLI 2CFG28XZrCvjxq3ux5hpmg2fCu+y98h6/yuwB/JhbFlDOSoluEpysiEL3R5GTKbxOF664q5W fiSObxNONxs86UtghqNDRUJgyS0W6TfykGOnZDVYAC9Gg8SbQDta1ymA0q76S/NG2MrJEOIr 1GtOr/UjNv2x4vW56dzX/3yuhK1ilpgzh1q504ETC6EKXMaFT8cNgsMlk9dOvWPwlsIJ249+ PizMDFGITxGTIrQAaUBO+HRLSBYdHNrHJtytkBoTjykCt7M6pl7l+jFYjGSw4fwexVy0MqsD AZ2coH82RTPb6Q7JABEBAAHCwXYEGAEKACAWIQSQjscwjkRszYUsDUsBr7Dkm9vK6wUCW0xU cQIbDAAKCRABr7Dkm9vK6+9uD/9Ld3X5cvnrwrkFMddpjFKoJ4yphtX2s+EQfKT6vMq3A1dJ tI7zHTFm60uBhX6eRbQow8fkHPcjXGJEoCSJf8ktwx/HYcBcnUK/aulHpvHIIYEma7BHry4x L+Ap7oBbBNiraS3Wu1k+MaX07BWhYYkpu7akUEtaYsCceVc4vpYNITUzPYCHeMwc5pLICA+7 VdI1rrTSAwlCtLGBt7ttbvaAKN4dysiN+/66Hlxnn8n952lZdG4ThPPzafG50EgcTa+dASgm tc6HaQAmJiwb4iWUOoUoM+udLRHcN6cE0bQivyH1bqF4ROeFBRz00MUJKvzUynR9E50F9hmd DOBJkyM3Z5imQ0RayEkRHhlhj7uECaojnUeewq4zjpAg2HTSMkdEzKRbdMEyXCdQXFnSCmUB 5yMIULuDbOODWo3EufExLjAKzIRWEKQ/JidLzO6hrhlQffsJ7MPTU+Hg7WxqWfn4zhuUcIQB SlkiRMalSiJITC2jG7oQRRh9tyNaDMkKzTbeFtHKRmUUAuhE0LBXP8Wc+5W7b3WOf2SO8JMR 4TqDZ0K06s66S5fOTW0h56iCCxTsAnRvM/tA4SERyRoFs/iTqJzboskZY0yKeWV4/IQxfOyC YwdU3//zANM1ZpqeE/8lnW/kx+fyzVyEioLSwkjDvdG++4GQ5r6PHQ7BbdEWhA==
  • Cc: "sergey.dyasli@xxxxxxxxxx >> Sergey Dyasli" <sergey.dyasli@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • Delivery-date: Thu, 08 Nov 2018 09:05:46 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 07/11/2018 18:20, Andrew Cooper wrote:
> On 09/10/18 16:21, Sergey Dyasli wrote:
>> Scrubbing RAM during boot may take a long time on machines with lots
>> of RAM. Add 'idle' option to bootscrub which marks all pages dirty
>> initially so they will eventually be scrubbed in idle-loop on every
>> online CPU.
>>
>> It's guaranteed that the allocator will return scrubbed pages by doing
>> eager scrubbing during allocation (unless MEMF_no_scrub was provided).
>>
>> Use the new 'idle' option as the default one.
>>
>> Signed-off-by: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
> 
> This patch reliably breaks boot, although its not immediately obvious how:
> 
> (d9) (XEN) mcheck_poll: Machine check polling timer started.
> (d9) (XEN) xenoprof: Initialization failed. Intel processor family 6 model 60 
> is not supported
> (d9) (XEN) Dom0 has maximum 400 PIRQs
> (d9) (XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Not tainted ]----
> (d9) (XEN) CPU:    0
> (d9) (XEN) RIP:    e008:[<ffff82d080440ddb>] setup.c#cmdline_cook+0x1d/0x77
> (d9) (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor
> (d9) (XEN) rax: ffff82d080406bdc   rbx: ffff8300c2c2c2c2   rcx: 
> 0000000000000000
> (d9) (XEN) rdx: 00000007c7ffffff   rsi: ffff83000045c24b   rdi: 
> ffff83000045c24b
> (d9) (XEN) rbp: ffff82d0804b7da8   rsp: ffff82d0804b7d98   r8:  
> ffff83003f057000
> (d9) (XEN) r9:  7fffffffffffffff   r10: 0000000000000000   r11: 
> 0000000000000001
> (d9) (XEN) r12: ffff83003f0d8100   r13: 0000000000000000   r14: 
> ffff82d0805f33d0
> (d9) (XEN) r15: 0000000000000002   cr0: 000000008005003b   cr4: 
> 00000000001526e0
> (d9) (XEN) cr3: 000000003fea7000   cr2: ffff8300c2c2c2c2
> (d9) (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 
> 0000000000000000
> (d9) (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (d9) (XEN) Xen code around <ffff82d080440ddb> 
> (setup.c#cmdline_cook+0x1d/0x77):
> (d9) (XEN)  05 5e fc ff 48 0f 44 d8 <80> 3b 20 75 09 48 83 c3 01 80 3b 20 74 
> f7 80 3d
> (d9) (XEN) Xen stack trace from rsp=ffff82d0804b7d98:
> (d9) (XEN)    0000000000000000 ffff8300c2c2c2c2 ffff82d0804b7ee8 
> ffff82d080443b7f
> (d9) (XEN)    0000000000000000 00000000003f3480 0000000000000002 
> 0000000000000002
> (d9) (XEN)    0000000000000002 0000000000000002 0000000000000002 
> 0000000000000001
> (d9) (XEN)    0000000000000001 0000000000000003 00000000000feffc 
> 0000000000000000
> (d9) (XEN)    00000000000feffd 0000000000000000 0000000000800163 
> 00000000feffd000
> (d9) (XEN)    ffff83000045c24b ffffffff00000002 0000000000000001 
> 0000000000000001
> (d9) (XEN)    ffff83000048da80 ffff82d08048db00 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000200000004 00000040ffffffff 
> 0000000000000400
> (d9) (XEN)    0000000800000000 000000010000006e 0000000000000003 
> 00000000000002f8
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> ffff82d0802000f3
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (d9) (XEN)    0000000000000000 0000000000000000 ffff83003f0ce000 
> 0000000000000000
> (d9) (XEN)    00000000001526e0 0000000000000000 0000000000000000 
> 0000060000000000
> (d9) (XEN)    0000001800000000
> (d9) (XEN) Xen call trace:
> (d9) (XEN)    [<ffff82d080440ddb>] setup.c#cmdline_cook+0x1d/0x77
> (d9) (XEN)    [<ffff82d080443b7f>] __start_xen+0x259c/0x292d
> (d9) (XEN)    [<ffff82d0802000f3>] __high_start+0x53/0x55
> (d9) (XEN) 
> (d9) (XEN) Pagetable walk from ffff8300c2c2c2c2:
> (d9) (XEN)  L4[0x106] = 800000003fea5063 ffffffffffffffff
> (d9) (XEN)  L3[0x003] = 000000003fea2063 ffffffffffffffff
> (d9) (XEN)  L2[0x016] = 0000000000000000 ffffffffffffffff
> (d9) (XEN) 
> (d9) (XEN) ****************************************
> (d9) (XEN) Panic on CPU 0:
> (d9) (XEN) FATAL PAGE FAULT
> (d9) (XEN) [error_code=0000]
> (d9) (XEN) Faulting linear address: ffff8300c2c2c2c2
> (d9) (XEN) ****************************************
> (d9) (XEN) 
> (d9) (XEN) Reboot in five seconds...
> 
> The low part of 0xffff8300c2c2c2c2 looks to be poisoned, so
> __va(mod[0].string) is obviously turning out to be junk.

0xc2 is a SCRUB_PATTERN, so my patch might have uncovered a real issue.
There are 2 implications of idle scrub:

    1. alloc_xenheap_pages() might return scrubbed memory (despite
       passing MEMF_no_scrub, and after secondary CPUs enter idle-loop)

    2. alloc_domheap_pages() will return scrubbed memory by default
       during Xen boot

What is the exact place of this crash? Maybe zeroing of allocated pages
is needed there? Can you reproduce the issue with Release build, where
scrub pattern is 0?

--
Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.