[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Live-Patch application failure in core-scheduling mode


  • To: Jürgen Groß <jgross@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>
  • From: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
  • Date: Thu, 6 Feb 2020 14:02:45 +0000
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=sergey.dyasli@xxxxxxxxxx; spf=Pass smtp.mailfrom=sergey.dyasli@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx
  • Autocrypt: addr=sergey.dyasli@xxxxxxxxxx; keydata= xsFNBFtMVHEBEADc/hZcLexrB6vGTdGqEUsYZkFGQh6Z1OO7bCtM1go1RugSMeq9tkFHQSOc 9c7W9NVQqLgn8eefikIHxgic6tGgKoIQKcPuSsnqGao2YabsTSSoeatvmO5HkR0xGaUd+M6j iqv3cD7/WL602NhphT4ucKXCz93w0TeoJ3gleLuILxmzg1gDhKtMdkZv6TngWpKgIMRfoyHQ jsVzPbTTjJl/a9Cw99vuhFuEJfzbLA80hCwhoPM+ZQGFDcG4c25GQGQFFatpbQUhNirWW5b1 r2yVOziSJsvfTLnyzEizCvU+r/Ek2Kh0eAsRFr35m2X+X3CfxKrZcePxzAf273p4nc3YIK9h cwa4ZpDksun0E2l0pIxg/pPBXTNbH+OX1I+BfWDZWlPiPxgkiKdgYPS2qv53dJ+k9x6HkuCy i61IcjXRtVgL5nPGakyOFQ+07S4HIJlw98a6NrptWOFkxDt38x87mSM7aSWp1kjyGqQTGoKB VEx5BdRS5gFdYGCQFc8KVGEWPPGdeYx9Pj2wTaweKV0qZT69lmf/P5149Pc81SRhuc0hUX9K DnYBa1iSHaDjifMsNXKzj8Y8zVm+J6DZo/D10IUxMuExvbPa/8nsertWxoDSbWcF1cyvZp9X tUEukuPoTKO4Vzg7xVNj9pbK9GPxSYcafJUgDeKEIlkn3iVIPwARAQABzShTZXJnZXkgRHlh c2xpIDxzZXJnZXkuZHlhc2xpQGNpdHJpeC5jb20+wsGlBBMBCgA4FiEEkI7HMI5EbM2FLA1L Aa+w5JvbyusFAltMVHECGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AAIQkQAa+w5JvbyusW IQSQjscwjkRszYUsDUsBr7Dkm9vK65AkEACvL+hErqbQj5yTVNqvP1rVGsXvevViglSTkHD4 9LGwEk4+ne8N4DPcqrDnyqYFd42UxTjVyoDEXEIIoy0RHWCmaspYEDX8fVmgFG3OFoeA9NAv JHssHU6B2mDAQ6M3VDmAwTw+TbXL/c1wblgGAP9kdurydZL8bevTTUh7edfnm5pwaT9HLXvl xLjz5qyt6tKEowM0xPVzCKaj3Mf/cuZFOlaWiHZ0biOPC0JeoHuz4UQTnBBUKk+n2nnn72k9 37cNeaxARwn/bxcej9QlbrrdaNGVFzjCA/CIL0KjUepowpLN0+lmYjkPgeLNYfyMXumlSNag 9qnCTh0QDsCXS/HUHPeBskAvwNpGBCkfiP/XqJ+V618ZQ1sclHa9aWNnlIR/a8xVx25t/14V R8EX/045HUpyPU8hI/yw+Fw/ugJ8W0dFzFeHU5K2tEW2W0m3ZWWWgpcBSCB17DDLIPjGX1Qc J8jiVJ7E4rfvA1JBg9BxVw5LVuXg2FB6bqnDYALfY2ydATk+ZzMUAMMilaE7/5a2RMV4TYcd 8Cf77LdgO0pB3vF6z1QmNA2IbOICtJOXpmvHj+dKFUt5hFVbvqXbuAjlrwFktbAFVGxaeIYz nQ44lQu9JqDuSH5yOytdek24Dit8SgEHGvumyj17liCG6kNzxd+2xh3uaUCA5MIALy5mZ87B TQRbTFRxARAAwqL3u/cPDA+BhU9ghtAkC+gyC5smWUL1FwTQ9CwTqcQpKt85PoaHn8sc5ctt Aj2fNT/F2vqQx/BthVOdkhj9LCwuslqBIqbri3XUyMLVV/Tf+ydzHW2AjufCowwgBguxedD1 f9Snkv+As7ZgMg/GtDqDiCWBFg9PneKvr+FPPd2WmrI8Kium4X5Zjs/a6OGUWVcIBoPpu088 z/0tlKYjTFLhoIEsf6ll4KvRQZIyGxclg3RBEuN+wgMbKppdUf2DBXYeCyrrPx809CUFzcik O99drWti2CV1gF8bnbUvfCewxwqgVKtHl2kfsm2+/lgG4CTyvnvWqUyHICZUqISdz5GidaXn TcPlsAeo2YU2NXbjwnmxzJEP/4FxgsjYIUbbxdmsK+PGre7HmGmaDZ8K77L3yHr/K7AH8mFs WUM5KiW4SnKyIQvdHkZMpvE4XrrirlZ+JI5vE043GzzpS2CGo0NFQmDJLRbpN/KQY6dkNVgA L0aDxJtAO1rXKYDSrvpL80bYyskQ4ivUa06v9SM2/bHi9bnp3Nf/fK6ErWKWmDOHWrnTgRML oQpcxoVPxw2CwyWT1069Y/CWwgnbj34+LMwMUYhPEZMitABpQE74dEtIFh0c2scm3K2QGhOP KQK3szqmXuX6MViMZLDh/B7FXLQyqwMBnZygfzZFM9vpDskAEQEAAcLBjQQYAQoAIBYhBJCO xzCORGzNhSwNSwGvsOSb28rrBQJbTFRxAhsMACEJEAGvsOSb28rrFiEEkI7HMI5EbM2FLA1L Aa+w5Jvbyuvvbg//S3d1+XL568K5BTHXaYxSqCeMqYbV9rPhEHyk+rzKtwNXSbSO8x0xZutL gYV+nkW0KMPH5Bz3I1xiRKAkiX/JLcMfx2HAXJ1Cv2rpR6bxyCGBJmuwR68uMS/gKe6AWwTY q2kt1rtZPjGl9OwVoWGJKbu2pFBLWmLAnHlXOL6WDSE1Mz2Ah3jMHOaSyAgPu1XSNa600gMJ QrSxgbe7bW72gCjeHcrIjfv+uh5cZ5/J/edpWXRuE4Tz82nxudBIHE2vnQEoJrXOh2kAJiYs G+IllDqFKDPrnS0R3DenBNG0Ir8h9W6heETnhQUc9NDFCSr81Mp0fROdBfYZnQzgSZMjN2eY pkNEWshJER4ZYY+7hAmqI51HnsKuM46QINh00jJHRMykW3TBMlwnUFxZ0gplAecjCFC7g2zj g1qNxLnxMS4wCsyEVhCkPyYnS8zuoa4ZUH37CezD01Ph4O1saln5+M4blHCEAUpZIkTGpUoi SEwtoxu6EEUYfbcjWgzJCs023hbRykZlFALoRNCwVz/FnPuVu291jn9kjvCTEeE6g2dCtOrO ukuXzk1tIeeoggsU7AJ0bzP7QOEhEckaBbP4k6ic26LJGWNMinllePyEMXzsgmMHVN//8wDT NWaanhP/JZ1v5Mfn8s1chIqC0sJIw73RvvuBkOa+jx0OwW3RFoQ=
  • Cc: "sergey.dyasli@xxxxxxxxxx >> Sergey Dyasli" <sergey.dyasli@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
  • Delivery-date: Thu, 06 Feb 2020 14:02:51 +0000
  • Ironport-sdr: lEm7dhbOyq7QTdEDTlAUe9pMgYVB916ziqmma9kmnjp1SzVkR/K/mXP2tWXMpIr8Dzd9GavQ2U kbDsnJPjGs5dH8Yn4da312QFIvS3Th8Xd3imRa2MsqIN2ihtbU73JY2erHADEYPcn/Jo04AS3j DKueHxRNI9jJ/jLHjCVKUgBWiZVgS2Sb1Iia4HDla1wPdMl2+kYIgoanODoH57a8ELelVt4O0F 9MBLtJ4G3tA33QdyqZ4Bq2qSh+hW3mwu0MpW3TatsoHGTw6ID4PxuXX6CX0Yr9RT2AAAHsWtlw ZnU=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06/02/2020 11:05, Sergey Dyasli wrote:
> On 06/02/2020 09:57, Jürgen Groß wrote:
>> On 05.02.20 17:03, Sergey Dyasli wrote:
>>> Hello,
>>>
>>> I'm currently investigating a Live-Patch application failure in core-
>>> scheduling mode and this is an example of what I usually get:
>>> (it's easily reproducible)
>>>
>>>      (XEN) [  342.528305] livepatch: lp: CPU8 - IPIing the other 15 CPUs
>>>      (XEN) [  342.558340] livepatch: lp: Timed out on semaphore in CPU 
>>> quiesce phase 13/15
>>>      (XEN) [  342.558343] bad cpus: 6 9
>>>
>>>      (XEN) [  342.559293] CPU:    6
>>>      (XEN) [  342.559562] Xen call trace:
>>>      (XEN) [  342.559565]    [<ffff82d08023f304>] R 
>>> common/schedule.c#sched_wait_rendezvous_in+0xa4/0x270
>>>      (XEN) [  342.559568]    [<ffff82d08023f8aa>] F 
>>> common/schedule.c#schedule+0x17a/0x260
>>>      (XEN) [  342.559571]    [<ffff82d080240d5a>] F 
>>> common/softirq.c#__do_softirq+0x5a/0x90
>>>      (XEN) [  342.559574]    [<ffff82d080278ec5>] F 
>>> arch/x86/domain.c#guest_idle_loop+0x35/0x60
>>>
>>>      (XEN) [  342.559761] CPU:    9
>>>      (XEN) [  342.560026] Xen call trace:
>>>      (XEN) [  342.560029]    [<ffff82d080241661>] R _spin_lock_irq+0x11/0x40
>>>      (XEN) [  342.560032]    [<ffff82d08023f323>] F 
>>> common/schedule.c#sched_wait_rendezvous_in+0xc3/0x270
>>>      (XEN) [  342.560036]    [<ffff82d08023f8aa>] F 
>>> common/schedule.c#schedule+0x17a/0x260
>>>      (XEN) [  342.560039]    [<ffff82d080240d5a>] F 
>>> common/softirq.c#__do_softirq+0x5a/0x90
>>>      (XEN) [  342.560042]    [<ffff82d080279db5>] F 
>>> arch/x86/domain.c#idle_loop+0x55/0xb0
>>>
>>> The first HT sibling is waiting for the second in the LP-application
>>> context while the second waits for the first in the scheduler context.
>>>
>>> Any suggestions on how to improve this situation are welcome.
>>
>> Can you test the attached patch, please? It is only tested to boot, so
>> I did no livepatch tests with it.
>
> Thank you for the patch! It seems to fix the issue in my manual testing.
> I'm going to submit automatic LP testing for both thread/core modes.

Andrew suggested to test late ucode loading as well and so I did.
It uses stop_machine() to rendezvous cpus and it failed with a similar
backtrace for a problematic CPU. But in this case the system crashed
since there is no timeout involved:

    (XEN) [  155.025168] Xen call trace:
    (XEN) [  155.040095]    [<ffff82d0802417f2>] R _spin_unlock_irq+0x22/0x30
    (XEN) [  155.069549]    [<ffff82d08023f3c2>] S 
common/schedule.c#sched_wait_rendezvous_in+0xa2/0x270
    (XEN) [  155.109696]    [<ffff82d08023f728>] F 
common/schedule.c#sched_slave+0x198/0x260
    (XEN) [  155.145521]    [<ffff82d080240e1a>] F 
common/softirq.c#__do_softirq+0x5a/0x90
    (XEN) [  155.180223]    [<ffff82d0803716f6>] F 
x86_64/entry.S#process_softirqs+0x6/0x20

It looks like your patch provides a workaround for LP case, but other
cases like stop_machine() remain broken since the underlying issue with
the scheduler is still there.

--
Thanks,
Sergey

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.