[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CPU Lockup bug with the credit2 scheduler


  • To: Sarah Newman <srn@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
  • From: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
  • Date: Tue, 18 Feb 2020 00:46:29 +0100
  • Autocrypt: addr=sander@xxxxxxxxxxxxxx; prefer-encrypt=mutual; keydata= mQMuBFNFDU0RCACWtSq295Y0xaUP3k7jub63jXIfXgGJ+LUbLJeS0mOeRC+xftv3qk9U2aTi z0Glopirh/6aRXb+rcxQ9hOVNfRHKvnHPyb1yC5zNadDAj+E+fO/iW0Yco5OnwUhNfEMkKT9 WKRW4oWD/uv9CkHRaPNZbX3Sd1u1ns0LUML3ayws/kUb7FPrh1CZVhgvfJuD85Kj1YpbHUyU v2nWBT+hoYEfT8y4SfhDPy9UMXrPlUGPFACvWfBy76mxpdTtzfuDk8r1s4hbvVKty29VBEcH 0fFKZmqsywhDWTP9ILibk0azXRvA+6ZD6D9WUBZ0TfC8vy1eG0zEDF7yOThatoYuWlqHAQC1 OFOG97B+zbc1R1HTYZUreUt28VQ1v+2pG/sZj0Mcgwf+N1UdiyS96pFYXUD9z0lQztCETF+I P/tiDapjhlQld6LGfHO3Qk3/tMtgh32TMxOjCTNrP1fn3eOdDRwyn7ZSzcIMZ0j55DXp5ut9 NJ0UxXMGnOWm/Lcz81EhR6QxR6EdgL5iCsJjqSq8DMaBz+dswanOksPrGzJ+IHFYYesRRzdE Z9dQpARCry4c0vX3wX4BG+vRdYHKfM4tHvr+wdM7U3F1ta8C0HU+lFfAH7/nhpxLB9/Ibin8 9+KItaujo4XJhOx8QLNnJiU1bPOvUHGqX2WUn8yb1eLgCx9nZ4YUrP1YDiilrMx9hGMdl0hk gKKoMWh8B2/qChSuKMI2Kwu+uwgAlqz0QuQkWFTLbXLpyHcnayT7TVKsAMgn7i0kl8CaeW9i /r62k2l2yZIXtJuLBEJ1qVZ1P1sXUYuFqIlOjW0VKyJ3IZa6cTLQfRxVN+ETGtAUGJvUpaNa n84nG3vAnmSmCnHDWBxr66wF4+UnIj0sHPlYclJGy/mrxs+OhhCog5NvbpzWAiLeZ1MeyNs+ JwWzW7I4o7PcHAt74PpVDiwhuPUZJs2fXi4u73lPxVfTKJHEHJrxbfgQP/qeWebIwADj1X4q s5njcvNBE+ier8EVwbyTBL7Fzf39Grw/9Kf4CiUCu7kxQTRyHyT/nJWuaMdR00yqEVGctv2n e3qrNqGambQqU2FuZGVyIEVpa2VsZW5ib29tIDxzYW5kZXJAZWlrZWxlbmJvb20uaXQ+iIEE ExEIACkFAlNFDU0CGwMFCQfIF1MHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRChIPzf fJPzZO3KAP9VcRwYdaETTC4+s3gscTMGFDUT+WYcNqpIIn/BLGKuFAD/eglObB2QxiXV4t1T ++6WKsvKtNlmrssnOqOmZyqPZQ8=
  • Cc: Alastair Browne <alastair.browne@xxxxxxxxxx>, Glen <glenbarney@xxxxxxxxx>, Tomas Mozes <hydrapolic@xxxxxxxxx>, PGNet Dev <pgnet.dev@xxxxxxxxx>
  • Delivery-date: Mon, 17 Feb 2020 23:46:53 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 17/02/2020 20:58, Sarah Newman wrote:
> On 1/7/20 6:25 AM, Alastair Browne wrote:
>>
>> CONCLUSION
>>
>> So in conclusion, the tests indicate that credit2 might be unstable.
>>
>> For the time being, we are using credit as the chosen scheduler. We
>> are booting the kernel with a parameter "sched=credit" to ensure that
>> the correct scheduler is used.
>>
>> After the tests, we decided to stick with 4.9.0.9 kernel and 4.12 Xen
>> for production use running credit1 as the default scheduler.
> 
> One person CC'ed appears to be having the same experience, where the credit2 
> scheduler leads to lockups (in this case in the domU, not the dom0) under 
> relatively heavy load. It seems possible they may have the same root cause.
> 
> I don't think there are, but have there been any patches since the 4.13.0 
> release which might have fixed problems with credit 2 scheduler? If not, 
> what would the next step be to isolating the problem - a debug build of Xen 
> or something else?
> 
> If there are no merged or proposed fixes soon, it may be worth considering 
> making the credit scheduler the default again until problems with the 
> credit2 scheduler are resolved.
> 
> Thanks, Sarah
> 
> 

Hi Sarah / Alastair,

I can only provide my n=1 (OK, I'm running a bunch of boxes, some of which 
pretty over-committed CPU wise), 
but I haven't seen any issues (lately) with credit2.

I did take a look at Alastair Browne's report your replied to 
(https://lists.xen.org/archives/html/xen-devel/2020-01/msg00361.html)
and I do see some differences:
    - Alastair's machine has multiple sockets, my machines don't.
    - It seems Alastair's config is using ballooning ? 
(dom0_mem=4096M,max:16384M), for me that has been a source of trouble in the 
past, so my configs don't.
    - kernel's tested are quite old (4.19.67 (latest upstream is 4.19.104), 
4.9.189 (latest upstream is 4.9.214)) and no really new kernel is tested
      (5.4 is available in Debian backport for buster). 
    - Alastair, are you using pv, hvm or pvh guests? The report seems to miss 
the Guest configs (I'm primarily using PVH, and few HVM's, no PV except for 
dom0) ?

Any how, could be worthwhile to test without ballooning, and test a recent 
kernel to rule out an issue with (missing) kernel backports.

--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.