[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

BUG in 1f3d87c75129 ("x86/vpt: do not take pt_migrate rwlock in some cases")


  • To: <xen-devel@xxxxxxxxxxxxx>, <boris.ostrovsky@xxxxxxxxxx>, <roger.pau@xxxxxxxxxx>, <stephen.s.brennan@xxxxxxxxxx>
  • From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • Date: Mon, 14 Jun 2021 12:15:35 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=N3VDvPhjn3tj27Bj+t9BHbFR5yJS7KEvtm/3JBs7nfA=; b=Jm+lj/2E9Z8OmpNKNVbo/FZrAwAwbGm9ngVTdty0AdsZwff8Bm6Xp7HME7/V08+GnuHgLM8Kwi6QNLUqxJrXOneHZfwQe8NVFkGA3ssBzytcyaEhQnEgBsLb2dt7K2C1Y8cpp/8dfBk7SIB1ckEg6uySEx+vDDToc+uZSbG1q+wgp4AbZroBwpDIJFyD0lCbWiD0qNiw/FlxhdnQI7s60Dm60KOvoY/63iG5QBEQ6b/8l4GnxOnatG/CRZW2TMjJtwDrr4imXZdIyEe2ldCu+CCpDy+j6SZU19FLIHu2bKTsSmUWrDgNd6lyBqy7y4ESYC20brVpSJX81iY/RXFlHg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W9Mz9KYKkZN/54SsPA7c5vqFYsGKC3HGE3HKwy56sBrSHsHmWb7jh3FQDlcGLN3Fcg/OnZ6j+Xo4lIJZ9QTE2p4ObNPs3en2XGUgCMpZeXTA5Zn/A85CYNJ++SaqTcefN45CWD4O75ykO32mOiFYwuYKsfO41rbeNNmXOEvzDmXMYFijqmjwYuU1ZPyRnrUhq2VBmMUOKgC7JLxzXDFlnmM2RwbJuaBYKpwNlz2knaCrp5gSzEh0MJBARivFTbel85AJxcznLliIM/eZEe6R97/n+ZER7/iZI17RP0KVAHl5SP/XXKbq0lYH9mJBr2fCccG0f8PSK4XLzQJjj8HuiQ==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Delivery-date: Mon, 14 Jun 2021 11:16:04 +0000
  • Ironport-hdrordr: A9a23:mD2IFKoqgAYp8WqtV5ImCoQaV5vaL9V00zEX/kB9WHVpm5Oj+f xGzc516farslossREb+exoS5PwO080kqQFnLX5XI3SJzUO3VHIEGgM1/qa/9SNIVyaygc/79 YQT0EdMqyXMbESt6+Ti2PUYrVQoqj1zEnBv5ah854Hd3APV0gP1XYfNu/WKDwPeOEQbqBJa6 Z0q/A36gaISDAyVICWF3MFV+/Mq5nik4/nWwcPA1oC5BOVhT2lxbbmG1zAty1uHA9n8PMHyy zoggb57qKsv7WSzQLd7Xba69BzlMH6wtVOKcSQgow+KynqiCyveIN9Mofy8AwdkaWK0hIHgd PMqxAvM4BY8HXKZFy4phPrxk3JzCsuw2WK8y7ZvVLT5ejCAB4qActIgoxUNjHD7VA7gd162K VXm0qEqptsCw/aliiV3amIa/hTrDv3nZMeq59Xs5QGOrFuLIO57LZvsn+9Ka1wXx4Ts+scYa 5T5Ki23ocnTbuYB0qp9lWHjubcGEjas3+9Mz8/U/euok1rdUZCvgIlLfwk7wU9Ha0GOu15Ds T/Q+9VfeJ1P4UrhZwUPpZ2fSLhMB2wffuLChPKHWja
  • Ironport-sdr: dKmKC9Fpa/NUpDcnAXRybM7rw8FvtvOwlzoNaa2YDD36zc3JJg7KwpTLwz3G0crxTzk1/8v/it SzXHOgNaV1zsr5bJpXVfL9ye17RH8rZJ7zftMx5DUBef7JFsgMFALkVKIDgMY3joUXQZmT8R3C yP7lNxNmshPSJZRjng2zGF2S7gAqrG/qgZfhxBSaXlrQxtrQMYAeriW3pP8GYMk41iP469lFab l5aarYOxuAIXlCQEgqVu3WU1bU6SEr4A3Sy4xoF0S5FiDQ7oCQzRTl2DVfZq503EwsuKBsBwar gKs=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi, Boris, Stephen, Roger,

We have stress tested recent changes on staging-4.13 which includes a
backport of the subject. Since the backport is identical to the
master branch and all of the pre-reqs are in place - we have no reason
to believe the issue is not the same on master.

Here is what we got by running heavy stress testing including multiple
repeated VM lifecycle operations with storage and network load:


Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:287
----[ Xen-4.13.3-10.7-d  x86_64  debug=y   Not tainted ]----
CPU:    17
RIP:    e008:[<ffff82d080246b65>] common/timer.c#active_timer+0xc/0x1b
RFLAGS: 0000000000010046   CONTEXT: hypervisor (d675v0)
rax: 0000000000000000   rbx: ffff83137a8ed300   rcx: 0000000000000000
rdx: ffff83303fff7fff   rsi: ffff83303fff2549   rdi: ffff83137a8ed300
rbp: ffff83303fff7cf8   rsp: ffff83303fff7cf8   r8:  0000000000000001
r9:  0000000000000000   r10: 0000000000000011   r11: 0000168b0cc08083
r12: 0000000000000000   r13: ffff82d0805cf300   r14: ffff82d0805cf300
r15: 0000000000000292   cr0: 0000000080050033   cr4: 00000000000426e0
cr3: 00000013c1a32000   cr2: 0000000000000000
fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
Xen code around <ffff82d080246b65> (common/timer.c#active_timer+0xc/0x1b):
 0f b6 47 2a 84 c0 75 02 <0f> 0b 3c 04 76 02 0f 0b 3c 02 0f 97 c0 5d c3 55
Xen stack trace from rsp=ffff83303fff7cf8:
   ffff83303fff7d48 ffff82d0802479f1 0000168b0192b846 ffff83137a8ed328
   000000001d0776eb ffff83137a8ed2c0 ffff83133ee47568 ffff83133ee47000
   ffff83133ee47560 ffff832b1a0cd000 ffff83303fff7d78 ffff82d08031e74e
   ffff83102d898000 ffff83133ee47000 ffff83102db8d000 0000000000000011
   ffff83303fff7dc8 ffff82d08027df19 0000000000000000 ffff83133ee47060
   ffff82d0805d0088 ffff83102d898000 ffff83133ee47000 0000000000000011
   0000000000000001 0000000000000011 ffff83303fff7e08 ffff82d0802414e0
   ffff83303fff7df8 0000168b0192b846 ffff83102d8a4660 0000168b0192b846
   ffff83102d8a4720 0000000000000011 ffff83303fff7ea8 ffff82d080241d6c
   ffff83133ee47000 ffff831244137a50 ffff83303fff7e48 ffff82d08031b5b8
   ffff83133ee47000 ffff832b1a0cd000 ffff83303fff7e68 ffff82d080312b65
   ffff83133ee47000 0000000000000000 ffff83303fff7ee8 ffff83102d8a4678
   ffff83303fff7ee8 ffff82d0805d6380 ffff82d0805d5b00 ffffffffffffffff
   ffff83303fff7fff 0000000000000000 ffff83303fff7ed8 ffff82d0802431f5
   ffff83133ee47000 0000000000000000 0000000000000000 0000000000000000
   ffff83303fff7ee8 ffff82d08024324a 00007ccfc00080e7 ffff82d08033930b
   ffffffffb0ebd5a0 000000000000000d 0000000000000062 0000000000000097
   000000000000001e 000000000000001f ffffffffb0ebd5ad 0000000000000000
   0000000000000005 000000000003d91d 0000000000000000 0000000000000000
   00000000000003d5 000000000000001e 0000000000000000 0000beef0000beef
Xen call trace:
   [<ffff82d080246b65>] R common/timer.c#active_timer+0xc/0x1b
   [<ffff82d0802479f1>] F stop_timer+0xf5/0x188
   [<ffff82d08031e74e>] F pt_save_timer+0x45/0x8a
   [<ffff82d08027df19>] F context_switch+0xf9/0xee0
   [<ffff82d0802414e0>] F common/schedule.c#sched_context_switch+0x146/0x151
   [<ffff82d080241d6c>] F common/schedule.c#schedule+0x28a/0x299
   [<ffff82d0802431f5>] F common/softirq.c#__do_softirq+0x85/0x90
   [<ffff82d08024324a>] F do_softirq+0x13/0x15
   [<ffff82d08033930b>] F vmx_asm_do_vmentry+0x2b/0x30

****************************************
Panic on CPU 17:
Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:287
****************************************

That looks like a race opened by this change - we didn't see the problem
before while running with all of the pre-req patches.

Could you please analyse this assertion failure?

Igor



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.