[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Recent upgrade of 4.13 -> 4.14 issue
- To: "marmarek@xxxxxxxxxxxxxxxxxxxxxx" <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>
- From: Frédéric Pierret <frederic.pierret@xxxxxxxxxxxx>
- Date: Sat, 31 Oct 2020 16:04:23 +0100
- Arc-authentication-results: i=1; mx.zohomail.com; dkim=pass header.i=qubes-os.org; spf=pass smtp.mailfrom=frederic.pierret@xxxxxxxxxxxx; dmarc=pass header.from=<frederic.pierret@xxxxxxxxxxxx> header.from=<frederic.pierret@xxxxxxxxxxxx>
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1604156670; h=Content-Type:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=5wJvmAsxNGDcaCrKZlXgv9HhD/4l9KxmPQt9VibSOXs=; b=aP1ZexqbGmpAi1SKktkBKJm80oFyMYFPfXf8d6VNCDIHLTPQwJtt9k4XO0e9/2W8pCZAHF6Kfpzpajbrz7oqDnAyIZdniBq4/H9xQcblxpbPM9QkrMSy0Mzb1R+MoziGmBfr0i5O5SrklvOY0WswngC8NGG9dfKHBpIOOar9/EQ=
- Arc-seal: i=1; a=rsa-sha256; t=1604156670; cv=none; d=zohomail.com; s=zohoarc; b=NFmbnOn4JSsroaFmRJK/IBLc8Ld9BFp8hXrV/VDt9PtL2hW67JglJHO7F3B7BAEc75StDag0fn48RIGAvyQxDHTJeRUsvoMJnEueGIiUjQR1m4Ec0xx42xG6hgo10S8XUtjzo4sUso7PpjW/E0igXaicmJPM7L+DUaV/jtDwsTg=
- Autocrypt: addr=frederic.pierret@xxxxxxxxxxxx; keydata= xsFNBFwkq3EBEADcfyaOkeuf+g96S1ieq05tJ8vTGsQrNXQ5RDE7ffagL0+EpfIP3x73x5Q0 Dy2rUVQ+oN1DHcueNL70RtNs9BFnoW0KZnskbT4nEJ9wQCQa22lQaIk9kCNVddh2HJKljtd8 vtovi97sWIjtzxx5Qwc2md0DY9AHhNC4KqKIW3tSPC17UsI8fASoNAHItYtyn2bO67p8pCIv ltoBrYnElD1Pyp5IGWiD2/YD325iPl2+qHVkUSWmb92hRRU19Rg+Uds8bVHqhz4cOqIE7jpX gYzTN/kq8sxBMh2OrQ/bSxLaccaNApIVSZVSAasVJfdscNDL9fjkHERK/AiSTleHrsgLf4PL w5koqPs/6JEIVI+t0pyg+Pa8uwFoeYTPrLSlw0f7bXSmlVfv8g7M7RWmk3T5QIpeHA0j3lEZ NbYRXzkI91HCt40X2bTb2jTKgvB9jQjEarpk6euvGs2Ig/U4MlUy3pG5Ehd2Ebn8Rz31JXpa A/GPaJ5DjzV0q9mkYkGDLYI3J/J+s2u0Kr0VswLaIN3WJn7kKEDwfc4s2kaAYfblE/p0zVir EVBum723MFH4DxhTrOoWgta2nyRHOoi0z0EVhYA+D86mFPWKb9roWvtnmFlssggGmqbJEMvt LbYnlSt3v32nfUXh12aQPwU/LCGIzq4oFNVrNp3aWPnSajLPpQARAQABzTxGcsOpZMOpcmlj IFBpZXJyZXQgKGZlcGl0cmUpIDxmcmVkZXJpYy5waWVycmV0QHF1YmVzLW9zLm9yZz7CwXgE EwECACIFAlwkq3ECGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEEhAELXNxXbiPLkQ AI6kEDyLl0TpvRDOanuD5YkVHLEYVuG62CJNwMjFoFRgZJnl+Fb5HBgthU9lBdMqNySg+s8y ekM9KRlUHKYjwAsyjPIjRtca4bH3V11/waKpvPBgPsC75CxSZ9uITprfEqX7V2OLbrYW94qw R8jX+n/wlEGG3pbfXG7FTnjxQWM0E0aSvO0Yb5EkjiJ7cwEiqvL04Uekt5I2Zc8iRDF9kneI NiNhzRtvrR1UN6KtiZNSk2NsLOptrUQ/1AU5jwH4mnQQymtYDsWddlRoDRC/bsAow7cBudj+ lekM3cNRZOazKZx5UPnN8nqvD7FqeAcZBVyrHZ4hcWqABaJEPv6CCHRiLQnGR9ze2O5Yh+/B unrOJdjdsib1ZECH9GtIcj4mmPAN84NO4r8a6Sn9jsXkd2Wj2N5wNrZMPslhfiaW2VHTfLmA Ot+wRwLRsFfqLykF8hMlNXXE4frxotwa6+PTd48Ws9H9aalSs0lebsG0623b4mBjy1coxFUw eclPInXsPEdu/Yu2r7xrgGouXH8KgDhqlqq60UaA5n/0XhIeZ8tBTYs+1B5/C9TjvNAUsBko b1EpfW3J4Gq14GqwK+eodOTL5t2f2PWN/IQyop/j0FMgVU5/PUS0pciz5ybyIJBLhbsJBvKb xM/NyxHrmNwGEknpoeq+XT8rEJ+/Ag8Wnjl0zsFNBFwkq3EBEADAPJdyFy4KeYpuGATWwWCN He8XNVqBplV0yVlT5pSiCyA3UK34JlGX9YJOj/FlMZGgh61vbiK+piRjm/lyb128wpMjnoOm qpbSLbra8NP8Mu5FZMcv8OxrSIr/RHq2heFg1j11QOMGwe6vPC918qpzmiaYj2qpKY/RYsG8 V+9+dpLEU75+mpHU7GlECfPmHYbnsismL/4+xH+8BG56yg0UFbfrNYonIQFSn5k/w6i7jt7M ++ZmWfEV5nCP2qvzeYDGAL6BbWVOjuDhrKsAIKnomCyy+MjcVP955PVdN2+OlPJng07oKtQr 5aNCaNpv/i4gLO1IScdfDwm6gdfB2Zg/7jTJrKw0kWPFl9rHfN7dLAR28u3uT8Rhicjdd7hg YlDWdbImhNL/Z7iL3eayH7T9qAVNU587MhWvIREyE1gj22cs0e1m6qMFpbFYG0709N2UwlpA H+Pd35bTi9q2o1pH91xBYH6QvvrwsuVYHwuc3xXLRVRXWXY8xvNFSlY1LB8A46JOtV/ZodYD yhxVGbeWp820cb0s1f689XCXqFYAzTfCit+EeboYORN5CGioXzS+z0S9IhPbdUuvqs7xvC24 8bM7nm84YdgVM7HWybOtpRpWpycwGs73IvbxyLE9aPe/Zw4PTKWvbJlcFioofLwTQE1XvWom FPD9LLrBl5NUjQARAQABwsFfBBgBAgAJBQJcJKtxAhsMAAoJEEhAELXNxXbilSkP/2NcazvU DGyQLm7tFp4HNqSQfFJ3+chzxfOOdNtdWE+RFetyx9R8DBGrPX8hjITWD9ZA2bbZZ+J+a39v yY7bNZkCGbWzPGK//O1cInL4Ecmj7Xm8DXjk3E2Xzv1YrZk/GBz9xK8mWXwhn90SHNadEf28 ghMXcmUJSqT+KTxQQjUVaEtQDdzQnYQKh/dHxs760QSAnXkWr0YVYxk8q8aa+G8iAkNJcb+W x5gWEw4ft3HpKMRq74OQvWayy0fXpTlusdnvZs0VVMeRpCW6iCt9UmsbfG6Nyf2MKKbWRJnt jy8mjJiFjiJ2j9s4yNIookRv8IfocULuhnx5FWsvIzX2Vwcd7G5objnY1DlCNQrhJUs/geoC UBjBJp7sfbHakWfTKxZjFsuCXT1dCEN7JXX6ABOshzDTwB0kq7Bq/EkOzPDQGfOPoX2h1KjH uvGWw5cBe8WLnEuhIyf/DWfMS1LbjFB4JlMUEcood5xvE4owpfZog+0a9gpBS6cg9bMgRUex 1C+w3fudJdPQwIRAjJgac0jTT6uDY8re9RhBDv83PRSM7AzxqEFvDj8K46dg1XvJcKs7K5PX pm5Pw4stVEAxIks5uR62wxygImkdvgjQRzJe4JWwAniBWsZG+cNYj6xcItqkupIb4PeOWgNQ QMhGv8DnbAdOOOnumAXWq0+wl5uP
- Cc: Juergen Gross <JGross@xxxxxxxx>, "George.Dunlap@xxxxxxxxxx" <George.Dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>
- Delivery-date: Sat, 31 Oct 2020 15:04:47 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Le 10/31/20 à 5:08 AM, marmarek@xxxxxxxxxxxxxxxxxxxxxx a écrit :
On Sat, Oct 31, 2020 at 04:27:58AM +0100, Dario Faggioli wrote:
On Sat, 2020-10-31 at 03:54 +0100, marmarek@xxxxxxxxxxxxxxxxxxxxxx
wrote:
On Sat, Oct 31, 2020 at 02:34:32AM +0000, Dario Faggioli wrote:
(XEN) *** Dumping CPU7 host state: ***
(XEN) Xen call trace:
(XEN) [<ffff82d040223625>] R _spin_lock+0x35/0x40
(XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
(XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
(XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60
(XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
(XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30
And so on, for (almost?) all CPUs.
Right. So, it seems like a live (I would say) lock. It might happen on
some resource which his shared among domains. And introduced (the
livelock, not the resource or the sharing) in 4.14.
Just giving a quick look, I see that vmx_do_resume() calls
vmx_clear_vmcs() which calls on_selected_cpus() which takes the
call_lock spinlock.
And none of these seems to have received much attention recently.
But this is just a really basic analysis!
I've looked at on_selected_cpus() and my understanding is this:
1. take call_lock spinlock
2. set function+args+what cpus to be called in a global "call_data" variable
3. ask CPUs to execute that function (smp_send_call_function_mask() call)
4. wait for all requested CPUs to execute the function, still holding
the spinlock
5. only then - release the spinlock
So, if any CPU does not execute requested function for any reason, it
will keep the call_lock locked forever.
I don't see any CPU waiting on step 4, but also I don't see call traces
from CPU3 and CPU8 in the log - that's because they are in guest (dom0
here) context, right? I do see "guest state" dumps from them.
The only three CPUs that do logged xen call traces and are not waiting on that
spin lock are:
CPU0:
(XEN) Xen call trace:
(XEN) [<ffff82d040240f89>] R vcpu_unblock+0x9/0x50
(XEN) [<ffff82d0402e0171>] S vcpu_kick+0x11/0x60
(XEN) [<ffff82d0402259c8>] S tasklet.c#do_tasklet_work+0x68/0xc0
(XEN) [<ffff82d040225a59>] S tasklet.c#tasklet_softirq_action+0x39/0x60
(XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30
CPU4:
(XEN) Xen call trace:
(XEN) [<ffff82d040227043>] R set_timer+0x133/0x220
(XEN) [<ffff82d040234e90>] S credit.c#csched_tick+0/0x3a0
(XEN) [<ffff82d04022660f>] S timer.c#timer_softirq_action+0x9f/0x300
(XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN) [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20
CPU14:
(XEN) Xen call trace:
(XEN) [<ffff82d040222dc0>] R do_softirq+0/0x10
(XEN) [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20
I'm not sure if any of those is related to that spin lock,
on_selected_cpus() call, or anything like that...
Hi,
Some newer logs here:
https://gist.github.com/fepitre/5b2da8cf2ef976c0b885ce7bcfbf7313
You can have piece of serial console at hang freeze then debug keys 'd' and '0'
blocked at one VCPU.
I hope that will help.
Regards,
Frédéric
Attachment:
OpenPGP_0x484010B5CDC576E2.asc
Description: application/pgp-keys
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
|