Xen project Mailing List

Re: Linux DomU freezes and dies under heavy memory shuffling

From: Jürgen Groß <jgross@xxxxxxxx>

Date: Sat, 13 Mar 2021 08:18:04 +0100

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>

Delivery-date: Sat, 13 Mar 2021 07:18:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 12.03.21 22:33, Roman Shaposhnik wrote:

Hi Jürgen,

just wanted to give you (and everyone who may be keeping an eye on
this) an update.

Somehow, after applying your kernel patch -- the VM is now running 10
days+ without a problem.

Can you check the kernel console messages, please? There are messages printed when a potential hang is detected, and the hanging cpu is tried to be woken up via another interrupt again. Look for messages containing "csd", so e.g. do dmesg | grep csd in the VM. Thanks, Juergen


I'll keep experimenting (A/B-testing style) but at this point I'm
actually pretty perplexed as to why this patch would make a difference
(since it is basically just for observability). Any thoughts on that?

Thanks,
Roman.

On Wed, Feb 24, 2021 at 7:06 PM Roman Shaposhnik <roman@xxxxxxxxxx> wrote:


Hi Jürgen!

sorry for the belated reply -- I wanted to externalize the VM before I
do -- but let me at least reply to you:

On Tue, Feb 23, 2021 at 5:17 AM Jürgen Groß <jgross@xxxxxxxx> wrote:


On 18.02.21 06:21, Roman Shaposhnik wrote:

On Wed, Feb 17, 2021 at 12:29 AM Jürgen Groß <jgross@xxxxxxxx
<mailto:jgross@xxxxxxxx>> wrote:

     On 17.02.21 09:12, Roman Shaposhnik wrote:
      > Hi Jürgen, thanks for taking a look at this. A few comments below:
      >
      > On Tue, Feb 16, 2021 at 10:47 PM Jürgen Groß <jgross@xxxxxxxx
     <mailto:jgross@xxxxxxxx>> wrote:
      >>
      >> On 16.02.21 21:34, Stefano Stabellini wrote:
      >>> + x86 maintainers
      >>>
      >>> It looks like the tlbflush is getting stuck?
      >>
      >> I have seen this case multiple times on customer systems now, but
      >> reproducing it reliably seems to be very hard.
      >
      > It is reliably reproducible under my workload but it take a long time
      > (~3 days of the workload running in the lab).

     This is by far the best reproduction rate I have seen up to now.

     The next best reproducer seems to be a huge installation with several
     hundred hosts and thousands of VMs with about 1 crash each week.

      >
      >> I suspected fifo events to be blamed, but just yesterday I've been
      >> informed of another case with fifo events disabled in the guest.
      >>
      >> One common pattern seems to be that up to now I have seen this
     effect
      >> only on systems with Intel Gold cpus. Can it be confirmed to be true
      >> in this case, too?
      >
      > I am pretty sure mine isn't -- I can get you full CPU specs if
     that's useful.

     Just the output of "grep model /proc/cpuinfo" should be enough.


processor: 3
vendor_id: GenuineIntel
cpu family: 6
model: 77
model name: Intel(R) Atom(TM) CPU  C2550  @ 2.40GHz
stepping: 8
microcode: 0x12d
cpu MHz: 1200.070
cache size: 1024 KB
physical id: 0
siblings: 4
core id: 3
cpu cores: 4
apicid: 6
initial apicid: 6
fpu: yes
fpu_exception: yes
cpuid level: 11
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer
aes rdrand lahf_lm 3dnowprefetch cpuid_fault epb pti ibrs ibpb stibp
tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida
arat md_clear
vmx flags: vnmi preemption_timer invvpid ept_x_only flexpriority
tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs: cpu_meltdown spectre_v1 spectre_v2 mds msbds_only
bogomips: 4800.19
clflush size: 64
cache_alignment: 64
address sizes: 36 bits physical, 48 bits virtual
power management:

      >
      >> In case anybody has a reproducer (either in a guest or dom0) with a
      >> setup where a diagnostic kernel can be used, I'd be _very_
     interested!
      >
      > I can easily add things to Dom0 and DomU. Whether that will
     disrupt the
      > experiment is, of course, another matter. Still please let me
     know what
      > would be helpful to do.

     Is there a chance to switch to an upstream kernel in the guest? I'd like
     to add some diagnostic code to the kernel and creating the patches will
     be easier this way.


That's a bit tough -- the VM is based on stock Ubuntu and if I upgrade
the kernel I'll have fiddle with a lot things to make workload
functional again.

However, I can install debug kernel (from Ubuntu, etc. etc.)

Of course, if patching the kernel is the only way to make progress --
lets try that -- please let me know.


I have found a nice upstream patch, which - with some modifications - I
plan to give our customer as a workaround.

The patch is for kernel 4.12, but chances are good it will apply to a
4.15 kernel, too.


I'm slightly confused about this patch -- it seems to me that it needs
to be applied to the guest kernel, correct?

If that's the case -- the challenge I have is that I need to re-build
the Canonical (Ubuntu) distro kernel with this patch -- this seems
a bit daunting at first (I mean -- I'm pretty good at rebuilding kernels
I just never do it with the vendor ones ;-)).

So... if there's anyone here who has any suggestions on how to do that
-- I'd appreciate pointers.

I have been able to gather some more data.

I have contacted the author of the upstream kernel patch I've been using
for our customer (and that helped, by the way).

It seems as if the problem is occurring when running as a guest at least
under Xen, KVM, and VMWare, and there have been reports of bare metal
cases, too. Hunting this bug is going on for several years now, the
patch author is at it since 8 months.

So we can rule out a Xen problem.

Finding the root cause is still important, of course, and your setup
seems to have the best reproduction rate up to now.

So any help would really be appreciated.

Is the VM self contained? Would it be possible to start it e.g. on a
test system on my side? If yes, would you be allowed to pass it on to
me?


I'm working on externalizing the VM in a way that doesn't disclose anything
about the customer workload. I'm almost there -- sans my question about
the vendor kernel rebuild. I plan to make that VM available this week.

Goes without saying, but I would really appreciate your help in chasing this.

Thanks,
Roman.

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.