[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BUG in 1f3d87c75129 ("x86/vpt: do not take pt_migrate rwlock in some cases")


  • To: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Jun 2021 13:53:09 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ypFJ2bjnskxJSpRs9AnLPofEHdB7DVcudEEVG0dBTZk=; b=Sg3a2gyqmH0LDw+/oLmgsnxrzSQMWx92bWy6EFu53KdPmANq7vqdAHVc57IzIV4GF+U0k9HODG9S/oxm/1ksxYeWF0x+Ze7lNqPoB4VpWJSwQXmqB37EjF7p2/SopVlh+lFaFogki0K50y9I+GVZwAcIKd01SisngHZPG4hpCFtlM2RaqxijERb6DUGr6YaFbROtSIvViTBdg/EJOJYZY5I9fc/AUZUwv5+o54amB1AKK114O59uE45147OVjoZ+481yjtd0nDACRec+wH5oEpe1CG4AbvWhBMDb2gviZ81eOrsA7f+zHcNs8toQG14oydRl5uJ4X1Q0OBg6P4x0LQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ceHSVXw9lf9oc0BTSdBuCXknBv+Kt2M2FLATRJTOgRvwjDfu/K0ohpWiw+M6AH1QzwgpPVgy79Ok4Ip7htKXrfjQy4qL4ctsQSxX2PdeviLkJNBu76KY0Ez6q1k4ujsKVNqXw2TR3y8i4eJsV82wdhi+27I+VdHHbaOh8nCKSx4OQO3hEya5VMlECJkNUIksIbC0yfFbH4Q4k4qA6OLXX5i2Zeixpa7dvYMAdbzwpNiQ7ZidqgVgumq2Crq1uzOBSQ/m6mSFptTeXRjAoAc+gdbw/dDvZjc2yVsohTzIYyaxcrOAaEq2RlNU0cvMb9FLqa9+R5t2AmIHPwHgLMLRXg==
  • Authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxx, boris.ostrovsky@xxxxxxxxxx, stephen.s.brennan@xxxxxxxxxx, roger.pau@xxxxxxxxxx
  • Delivery-date: Mon, 14 Jun 2021 11:53:32 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.06.2021 13:15, Igor Druzhinin wrote:
> Hi, Boris, Stephen, Roger,
> 
> We have stress tested recent changes on staging-4.13 which includes a
> backport of the subject. Since the backport is identical to the
> master branch and all of the pre-reqs are in place - we have no reason
> to believe the issue is not the same on master.
> 
> Here is what we got by running heavy stress testing including multiple
> repeated VM lifecycle operations with storage and network load:
> 
> 
> Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:287
> ----[ Xen-4.13.3-10.7-d  x86_64  debug=y   Not tainted ]----
> CPU:    17
> RIP:    e008:[<ffff82d080246b65>] common/timer.c#active_timer+0xc/0x1b
> RFLAGS: 0000000000010046   CONTEXT: hypervisor (d675v0)
> rax: 0000000000000000   rbx: ffff83137a8ed300   rcx: 0000000000000000
> rdx: ffff83303fff7fff   rsi: ffff83303fff2549   rdi: ffff83137a8ed300
> rbp: ffff83303fff7cf8   rsp: ffff83303fff7cf8   r8:  0000000000000001
> r9:  0000000000000000   r10: 0000000000000011   r11: 0000168b0cc08083
> r12: 0000000000000000   r13: ffff82d0805cf300   r14: ffff82d0805cf300
> r15: 0000000000000292   cr0: 0000000080050033   cr4: 00000000000426e0
> cr3: 00000013c1a32000   cr2: 0000000000000000
> fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> Xen code around <ffff82d080246b65> (common/timer.c#active_timer+0xc/0x1b):
>   0f b6 47 2a 84 c0 75 02 <0f> 0b 3c 04 76 02 0f 0b 3c 02 0f 97 c0 5d c3 55
> Xen stack trace from rsp=ffff83303fff7cf8:
>     ffff83303fff7d48 ffff82d0802479f1 0000168b0192b846 ffff83137a8ed328
>     000000001d0776eb ffff83137a8ed2c0 ffff83133ee47568 ffff83133ee47000
>     ffff83133ee47560 ffff832b1a0cd000 ffff83303fff7d78 ffff82d08031e74e
>     ffff83102d898000 ffff83133ee47000 ffff83102db8d000 0000000000000011
>     ffff83303fff7dc8 ffff82d08027df19 0000000000000000 ffff83133ee47060
>     ffff82d0805d0088 ffff83102d898000 ffff83133ee47000 0000000000000011
>     0000000000000001 0000000000000011 ffff83303fff7e08 ffff82d0802414e0
>     ffff83303fff7df8 0000168b0192b846 ffff83102d8a4660 0000168b0192b846
>     ffff83102d8a4720 0000000000000011 ffff83303fff7ea8 ffff82d080241d6c
>     ffff83133ee47000 ffff831244137a50 ffff83303fff7e48 ffff82d08031b5b8
>     ffff83133ee47000 ffff832b1a0cd000 ffff83303fff7e68 ffff82d080312b65
>     ffff83133ee47000 0000000000000000 ffff83303fff7ee8 ffff83102d8a4678
>     ffff83303fff7ee8 ffff82d0805d6380 ffff82d0805d5b00 ffffffffffffffff
>     ffff83303fff7fff 0000000000000000 ffff83303fff7ed8 ffff82d0802431f5
>     ffff83133ee47000 0000000000000000 0000000000000000 0000000000000000
>     ffff83303fff7ee8 ffff82d08024324a 00007ccfc00080e7 ffff82d08033930b
>     ffffffffb0ebd5a0 000000000000000d 0000000000000062 0000000000000097
>     000000000000001e 000000000000001f ffffffffb0ebd5ad 0000000000000000
>     0000000000000005 000000000003d91d 0000000000000000 0000000000000000
>     00000000000003d5 000000000000001e 0000000000000000 0000beef0000beef
> Xen call trace:
>     [<ffff82d080246b65>] R common/timer.c#active_timer+0xc/0x1b
>     [<ffff82d0802479f1>] F stop_timer+0xf5/0x188
>     [<ffff82d08031e74e>] F pt_save_timer+0x45/0x8a
>     [<ffff82d08027df19>] F context_switch+0xf9/0xee0
>     [<ffff82d0802414e0>] F common/schedule.c#sched_context_switch+0x146/0x151
>     [<ffff82d080241d6c>] F common/schedule.c#schedule+0x28a/0x299
>     [<ffff82d0802431f5>] F common/softirq.c#__do_softirq+0x85/0x90
>     [<ffff82d08024324a>] F do_softirq+0x13/0x15
>     [<ffff82d08033930b>] F vmx_asm_do_vmentry+0x2b/0x30
> 
> ****************************************
> Panic on CPU 17:
> Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:287
> ****************************************

Since this suggests a timer was found on the list without ever having been
initialized, I've spotted a case where this indeed could now happen. Could
you give the patch below a try?

Jan

x86/vpt: fully init timers before putting onto list

With pt_vcpu_lock() no longer acquiring the pt_migrate lock, parties
iterating the list and acting on the timers of the list entries will no
longer be kept from entering their loops by create_periodic_time()'s
holding of that lock. Therefore at least init_timer() needs calling
ahead of list insertion, but keep this and set_timer() together.

Fixes: 8113b02f0bf8 ("x86/vpt: do not take pt_migrate rwlock in some cases")
Reported-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

--- unstable.orig/xen/arch/x86/hvm/vpt.c
+++ unstable/xen/arch/x86/hvm/vpt.c
@@ -554,14 +554,14 @@ void create_periodic_time(
     pt->cb = cb;
     pt->priv = data;
 
+    init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
+    set_timer(&pt->timer, pt->scheduled);
+
     pt_vcpu_lock(v);
     pt->on_list = 1;
     list_add(&pt->list, &v->arch.hvm.tm_list);
     pt_vcpu_unlock(v);
 
-    init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
-    set_timer(&pt->timer, pt->scheduled);
-
     write_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
 }
 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.