[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Problems with osstest "guest-localmigrate/x10"


  • To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Mon, 11 Jun 2018 13:23:53 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNHkp1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmRlPsLAeQQTAQIAIwUCU4xw6wIbAwcL CQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJELDendYovxMvi4UH/Ri+OXlObzqMANruTd4N zmVBAZgx1VW6jLc8JZjQuJPSsd/a+bNr3BZeLV6lu4Pf1Yl2Log129EX1KWYiFFvPbIiq5M5 kOXTO8Eas4CaScCvAZ9jCMQCgK3pFqYgirwTgfwnPtxFxO/F3ZcS8jovza5khkSKL9JGq8Nk czDTruQ/oy0WUHdUr9uwEfiD9yPFOGqp4S6cISuzBMvaAiC5YGdUGXuPZKXLpnGSjkZswUzY d9BVSitRL5ldsQCg6GhDoEAeIhUC4SQnT9SOWkoDOSFRXZ+7+WIBGLiWMd+yKDdRG5RyP/8f 3tgGiB6cyuYfPDRGsELGjUaTUq3H2xZgIPfOwE0EU4xwFgEIAMsx+gDjgzAY4H1hPVXgoLK8 B93sTQFN9oC6tsb46VpxyLPfJ3T1A6Z6MVkLoCejKTJ3K9MUsBZhxIJ0hIyvzwI6aYJsnOew cCiCN7FeKJ/oA1RSUemPGUcIJwQuZlTOiY0OcQ5PFkV5YxMUX1F/aTYXROXgTmSaw0aC1Jpo w7Ss1mg4SIP/tR88/d1+HwkJDVW1RSxC1PWzGizwRv8eauImGdpNnseneO2BNWRXTJumAWDD pYxpGSsGHXuZXTPZqOOZpsHtInFyi5KRHSFyk2Xigzvh3b9WqhbgHHHE4PUVw0I5sIQt8hJq 5nH5dPqz4ITtCL9zjiJsExHuHKN3NZsAEQEAAcLAXwQYAQIACQUCU4xwFgIbDAAKCRCw3p3W KL8TL0P4B/9YWver5uD/y/m0KScK2f3Z3mXJhME23vGBbMNlfwbr+meDMrJZ950CuWWnQ+d+ Ahe0w1X7e3wuLVODzjcReQ/v7b4JD3wwHxe+88tgB9byc0NXzlPJWBaWV01yB2/uefVKryAf AHYEd0gCRhx7eESgNBe3+YqWAQawunMlycsqKa09dBDL1PFRosF708ic9346GLHRc6Vj5SRA UTHnQqLetIOXZm3a2eQ1gpQK9MmruO86Vo93p39bS1mqnLLspVrL4rhoyhsOyh0Hd28QCzpJ wKeHTd0MAWAirmewHXWPco8p1Wg+V+5xfZzuQY0f4tQxvOpXpt4gQ1817GQ5/Ed/wsDtBBgB CAAgFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrd8NACGwIAgQkQsN6d1ii/Ey92IAQZFggA HRYhBFMtsHpB9jjzHji4HoBcYbtP2GO+BQJa3fDQAAoJEIBcYbtP2GO+TYsA/30H/0V6cr/W V+J/FCayg6uNtm3MJLo4rE+o4sdpjjsGAQCooqffpgA+luTT13YZNV62hAnCLKXH9n3+ZAgJ RtAyDWk1B/0SMDVs1wxufMkKC3Q/1D3BYIvBlrTVKdBYXPxngcRoqV2J77lscEvkLNUGsu/z W2pf7+P3mWWlrPMJdlbax00vevyBeqtqNKjHstHatgMZ2W0CFC4hJ3YEetuRBURYPiGzuJXU pAd7a7BdsqWC4o+GTm5tnGrCyD+4gfDSpkOT53S/GNO07YkPkm/8J4OBoFfgSaCnQ1izwgJQ jIpcG2fPCI2/hxf2oqXPYbKr1v4Z1wthmoyUgGN0LPTIm+B5vdY82wI5qe9uN6UOGyTH2B3p hRQUWqCwu2sqkI3LLbTdrnyDZaixT2T0f4tyF5Lfs+Ha8xVMhIyzNb1byDI5FKCb
  • Delivery-date: Mon, 11 Jun 2018 11:24:05 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

While trying to reproduce the problem why EFAULT is sporadically
returned when doing "xl save" of a HVM guest I happened to catch
another bug:

From time to time we have seen failures of

test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-localmigrate/x10

where there seemed to be problems with suspend handling in Xen. I have
now seen the very same problem while trying to do "xl save", but I could
look into the guest after that. The guest had the following in its
kernel log:

[ 2680.945450] Freezing user space processes ...
[ 2700.949012] Freezing of tasks failed after 20.003 seconds (1 tasks
refusing to freeze, wq_busy=0):
[ 2700.949027] btrfs           D    0  1976   1971 0x00000004
[ 2700.949033] Call Trace:
[ 2700.949059]  ? __schedule+0x2bf/0x850
[ 2700.949066]  schedule+0x39/0x90
[ 2700.949073]  io_schedule+0x12/0x40
[ 2700.949081]  blk_mq_get_tag+0x12b/0x260
[ 2700.949090]  ? elv_bio_merge_ok+0x12/0x70
[ 2700.949097]  ? remove_wait_queue+0x60/0x60
[ 2700.949102]  blk_mq_get_request+0xe6/0x3d0
[ 2700.949108]  blk_mq_make_request+0x10b/0x640
[ 2700.949115]  generic_make_request+0xf8/0x2e0
[ 2700.949120]  submit_bio+0x6e/0x140
[ 2700.949185]  scrub_add_page_to_rd_bio+0xf5/0x280 [btrfs]
[ 2700.949195]  ? __alloc_pages_nodemask+0xd1/0x260
[ 2700.949241]  scrub_pages+0x205/0x420 [btrfs]
[ 2700.949285]  scrub_stripe+0x934/0x10e0 [btrfs]
[ 2700.949297]  ? _raw_spin_unlock+0xc/0x20
[ 2700.949328]  ? block_rsv_release_bytes+0x148/0x2a0 [btrfs]
[ 2700.949369]  scrub_chunk+0x10a/0x150 [btrfs]
[ 2700.949408]  scrub_enumerate_chunks+0x27c/0x610 [btrfs]
[ 2700.949417]  ? add_wait_queue+0x70/0x70
[ 2700.949453]  btrfs_scrub_dev+0x1f2/0x510 [btrfs]
[ 2700.949462]  ? _copy_from_user+0x2e/0x60
[ 2700.949503]  btrfs_ioctl+0x11ab/0x2070 [btrfs]
[ 2700.949513]  ? kmem_cache_alloc_node+0x1dc/0x210
[ 2700.949516]  ? create_task_io_context+0x1e/0xf0
[ 2700.949523]  do_vfs_ioctl+0x8f/0x5c0
[ 2700.949527]  ? get_task_io_context+0x42/0x70
[ 2700.949534]  ? __fget+0x6c/0xa0
[ 2700.949539]  SyS_ioctl+0x74/0x80
[ 2700.949544]  entry_SYSCALL_64_fastpath+0x24/0x87
[ 2700.949549] RIP: 0033:0x7f03452424b7
[ 2700.949552] RSP: 002b:00007f034515ed68 EFLAGS: 00000246
[ 2700.949558] OOM killer enabled.
[ 2700.949560] Restarting tasks ... done.

This is a rather recent kernel (4.15). The backtrace shows rather
clearly that suspending failed due to some problems while doing
block I/O (I should note here that Xenstore is being suspended
_after_ trying to freeze processes).

So I'm quite confident that this problem is in no way related to Xen,
but could happen on bare metal, too, e.g. when closing the lid of a
notebook.

Another note: A retry of suspending the guest worked like a charm, so
we could retry to suspend in libxl. Another idea would be to have a
way to tell Xen suspend failed inside the guest in order to know xl
doesn't have to wait for the end of the timeout...

And yes, this problem is completely different to the EFAULT problem
which can't be the guest's problem.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.