[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen 4.10: domU crashes during/after live-migrate


  • To: Pim van den Berg <pim.van.den.berg@xxxxxxxxxx>, xen-users@xxxxxxxxxxxxxxxxxxxx
  • From: Hans van Kranenburg <hans@xxxxxxxxxxx>
  • Date: Tue, 4 Sep 2018 17:41:03 +0200
  • Autocrypt: addr=hans@xxxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFo2pooBEADwTBe/lrCa78zuhVkmpvuN+pXPWHkYs0LuAgJrOsOKhxLkYXn6Pn7e3xm+ ySfxwtFmqLUMPWujQYF0r5C6DteypL7XvkPP+FPVlQnDIifyEoKq8JZRPsAFt1S87QThYPC3 mjfluLUKVBP21H3ZFUGjcf+hnJSN9d9MuSQmAvtJiLbRTo5DTZZvO/SuQlmafaEQteaOswme DKRcIYj7+FokaW9n90P8agvPZJn50MCKy1D2QZwvw0g2ZMR8yUdtsX6fHTe7Ym+tHIYM3Tsg 2KKgt17NTxIqyttcAIaVRs4+dnQ23J98iFmVHyT+X2Jou+KpHuULES8562QltmkchA7YxZpT mLMZ6TPit+sIocvxFE5dGiT1FMpjM5mOVCNOP+KOup/N7jobCG15haKWtu9k0kPz+trT3NOn gZXecYzBmasSJro60O4bwBayG9ILHNn+v/ZLg/jv33X2MV7oYXf+ustwjXnYUqVmjZkdI/pt 30lcNUxCANvTF861OgvZUR4WoMNK4krXtodBoEImjmT385LATGFt9HnXd1rQ4QzqyMPBk84j roX5NpOzNZrNJiUxj+aUQZcINtbpmvskGpJX0RsfhOh2fxfQ39ZP/0a2C59gBQuVCH6C5qsY rc1qTIpGdPYT+J1S2rY88AvPpr2JHZbiVqeB3jIlwVSmkYeB/QARAQABzR5Kb2hhbm5lcyBN YXJpam4gdmFuIEtyYW5lbmJ1cmfCwZEEEwEKADsCGwMFCwkIBwMFFQoJCAsFFgIDAQACHgEC F4AWIQTib9aPwejUthlFRk7ngVcyGAwqVQUCWjawgAIZAQAKCRDngVcyGAwqVZZ3D/98GzxN iFK38eh60e9TARh4HCgEWHD14/YK6KGpzF5UXM7CkKnb0NDjM3TzeeaIYzsOJITSW6rMOm5L NcJTUmw0x4vt43yc+DFAaBNiywXWgFc6g9RpYg5X33y+jhbjDREsGMDAk89isKWo8I8+rZwl S9FSSopWkrj0wV64TRwAlTCrYaTlS56mHa9T5RJkmIY+suxRr3Xl2gcKvng2Kh2WCDHjItUF /t3DfjMCIEL18QlXieyY2w1K0h4iT93YNkEdSpElsD5lFdt7XUfy3Q89eQHtd5n21cXG9lMc fcSbmHdn0ugYF0Hu2xVKCcYwWEgLjLRJ+G4aLQW122PKVVpn15/n7KMX9hQNMH4T8krEqOpd Vdb982gx5GSa+2j44+kOFTCnREN0w15JZI8Osi48xLdPqcrMVtvq9ga8tIPebAs8IM8Mf4JY okBS5sbCGEWZSSsDSdYm/Fp39HA3AEl2nI+wnJZCdgLx5NEnCd5Ni9d6rzC8Te7SvVvA/qlo sVDZAo6MJBYgoO9lPKHYD0FWomAeOlFVjdob0G2n1xBRjroVK0JQI3jpPQoZpc1TLauUQ+kT BQwWwFlpbfBbf0+CACWiQL0YgNNiZn885h4vU0EQI/FizjWUHxVLhXt1K4+x7nkhCZYzaIFL jLqw4y8f6SF9DxRMTM8dcaIQyThkms7BTQRaOtArARAA50ylThKbq0ACHyomxjQ6nFNxa9IC p6byU9LhhKOax0GB6l4WebMsQLhVGRQ8H7DT84E7QLRYsidEbneB1ciToZkL5YFFaVxY0Hj1 wKxCFcVoCRNtOfoPnHQ5m/eDLaO4o0KKL/kaxZwTn2jnl6BQDGX1Aak0u4KiUlFtoWn/E/NI v5QbTGSwIYuzWqqYBIzFtDbiQRvGw0NuKxAGMhwXy8VP05mmNwRdyh/CC4rWQPBTvTeMwr3n l8/G+16/cn4RNGhDiGTTXcX03qzZ5jZ5N7GLY5JtE6pTpLG+EXn5pAnQ7MvuO19cCbp6Dj8f XRmI0SVXWKSo0A2C8xH6KLCRfUMzD7nvDRU+bAHQmbi5cZBODBZ5yp5CfIL1KUCSoiGOMpMi n3FrarIlcxhNtoE+ya23A+JVtOwtM53ESra9cJL4WPkyk/E3OvNDmh8U6iZXn4ZaKQTHaxN9 yvmAUhZQiQi/sABwxCcQQ2ydRb86Vjcbx+FUr5OoEyQS46gc3KN5yax9D3H9wrptOzkNNMUh Fj0oK0fX/MYDWOFeuNBTYk1uFRJDmHAOp01rrMHRogQAkMBuJDMrMHfolivZw8RKfdPzgiI5 00okLTzHC0wgSSAOyHKGZjYjbEwmxsl3sLJck9IPOKvqQi1DkvpOPFSUeX3LPBIav5UUlXt0 wjbzInUAEQEAAcLBdgQYAQoAIBYhBOJv1o/B6NS2GUVGTueBVzIYDCpVBQJaOtArAhsMAAoJ EOeBVzIYDCpV4kgP+wUh3BDRhuKaZyianKroStgr+LM8FIUwQs3Fc8qKrcDaa35vdT9cocDZ jkaGHprpmlN0OuT2PB+Djt7am2noV6Kv1C8EnCPpyDBCwa7DntGdGcGMjH9w6aR4/ruNRUGS 1aSMw8sRQgpTVWEyzHlnIH92D+k+IhdNG+eJ6o1fc7MeC0gUwMt27Im+TxVxc0JRfniNk8PU Ag4kvJq7z7NLBUcJsIh3hM0WHQH9AYe/mZhQq5oyZTsz4jo/dWFRSlpY7zrDS2TZNYt4cCfZ j1bIdpbfSpRi9M3W/yBF2WOkwYgbkqGnTUvr+3r0LMCH2H7nzENrYxNY2kFmDX9bBvOWsWpc MdOEo99/Iayz5/q2d1rVjYVFRm5U9hG+C7BYvtUOnUvSEBeE4tnJBMakbJPYxWe61yANDQub PsINB10ingzsm553yqEjLTuWOjzdHLpE4lzD416ExCoZy7RLEHNhM1YQSI2RNs8umlDfZM9L ek1+1kgBvT3RH0/CpPJgveWV5xDOKuhD8j5l7FME+t2RWP+gyLid6dE0C7J03ir90PlTEkME HEzyJMPtOhO05Phy+d51WPTo1VSKxhL4bsWddHLfQoXW8RQ388Q69JG4m+JhNH/XvWe3aQFp YP+GZuzOhkMez0lHCaVOOLBSKHkAHh9i0/pH+/3hfEa4NsoHCpyy
  • Cc: hans.van.kranenburg@xxxxxxxxxx
  • Delivery-date: Tue, 04 Sep 2018 15:42:32 +0000
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>
  • Openpgp: preference=signencrypt

Hi,

On 04/13/2018 09:38 AM, Pim van den Berg wrote:
> Hi all,
> 
> We (at Mendix) are upgrading our dom0s to Xen 4.10 (PV) running on Debian
> Stretch (Linux 4.9), but we are running into an issue regarding 
> live-migration.
> 
> We are experiencing domU crashes while live-migrating and in the seconds after
> the live-migration has been completed. This doesn't happen all the time. But 
> we
> are able to reproduce the issue within 1 to max 10 times live migrating 
> between
> 2 dom0s.
> 
> We've reproduced this so far with domUs running Linux 4.9.82-1+deb9u3 (Debian
> Stretch) and 4.15.11-1 (Debian Buster).
> 
> [...]

So... flash forward *whoosh*:

For Debian users, it seems best to avoid the Debian 4.9 LTS Linux (for
dom0 as well as domU) if you want to use live migration, or maybe even
in general together with Xen.

A few of the things I could cause to happen with recent Linux 4.9 in
dom0/domU:

1) blk-mq related Oops

Oops in the domU while resuming after live migrate (blkfront_resume ->
blk_mq_update_nr_hw_queues -> blk_mq_queue_reinit ->
blk_mq_insert_requests). A related fix might be
https://patchwork.kernel.org/patch/9462771/ but that's only present in
later kernels.

Apparently having this happen upsets the dom0 side of it, since any
subsequent domU that is live migrated to the same dom0, also using
blk-mq will immediately crash with the same Oops, after which is starts
raining general protection faults inside. But, at the same time, I can
still live migrate 3.16 kernels, but also 4.17 domU kernels on and off
that dom0.

2) Dom0 crash on live migration with multiple active nics

I actually have to do more testing for specifically this, but at least
I'm able to reliably crash a 4.9 Linux dom0 running on Xen 4.4 (last
tested a few months ago, Debian Jessie) by live migrating a domU that
has multiple network interfaces, actively routing traffic over them, to
it. *poof*, hypervisor reporting '(XEN) Domain 0 crashed: 'noreboot' set
- not rebooting.' *BOOM* everything gone.

3) xenconsoled disappearing

When live migration errors happen, it regularly happens that the
xenconsoled process in dom0 just disappears. I have no idea why. No
segfault message in dmesg or anything, it's just gone.

These are just examples. There are more errors that I ran into, and that
I still have to re-test again. If someone is interested in more details,
I have a collection of errors and stack traces etc.

What did I end up with now?

* Xen 4.11 (latest stable-4.11)
* Linux 4.17.17 in (Debian Stretch) dom0 and in (Stretch, Buster) domUs
* Linux 3.16.57 for old Jessie domUs is not a problem.

In a small test environment, I just completed about 2000 random live
migrations movements of ~20 domUs over 6 dom0s, throwing 10 concurrent
at it, without anything bad happen. To generate at least some extra
load, I was continuously running puppet on them, while the puppet
masters were also in the domU mix.

With 4.9 anywhere, it would only take a few minutes for everything to
explode.

To be continued...

Hans

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.