[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 1/6] VMX: Statically assign two PI hooks




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Wednesday, September 28, 2016 5:39 PM
> To: Wu, Feng <feng.wu@xxxxxxxxx>
> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> devel@xxxxxxxxxxxxx
> Subject: RE: [Xen-devel] [PATCH v4 1/6] VMX: Statically assign two PI hooks
> 
> >>> On 28.09.16 at 08:48, <feng.wu@xxxxxxxxx> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Monday, September 26, 2016 8:10 PM
> >> To: Wu, Feng <feng.wu@xxxxxxxxx>
> >> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> >> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> >> devel@xxxxxxxxxxxxx
> >> Subject: Re: [Xen-devel] [PATCH v4 1/6] VMX: Statically assign two PI hooks
> >>
> >> >>> On 26.09.16 at 13:37, <JBeulich@xxxxxxxx> wrote:
> >> >>>> On 21.09.16 at 04:37, <feng.wu@xxxxxxxxx> wrote:
> >> >> PI hooks: vmx_pi_switch_from() and vmx_pi_switch_to() are
> >> >> needed even when any previously assigned device is detached
> >> >> from the domain. Since 'SN' bit is also used to control the
> >> >> CPU side PI and we change the state of SN bit in these two
> >> >> functions, then evaluate this bit in vmx_deliver_posted_intr()
> >> >> when trying to deliver the interrupt in posted way via software.
> >> >> The problem is if we deassign the hooks while the vCPU is runnable
> >> >> in the runqueue with 'SN' set, all the furture notificaton event
> >> >> will be suppressed. This patch makes these two hooks statically
> >> >> assigned.
> >> >
> >> > So if only SN left set is a problem, why do you need to also keep
> >> > vmx_pi_switch_from in place? It's vmx_pi_switch_to() which clears
> >> > the bit, and vmx_deliver_posted_intr() doesn't actively change it.
> >>
> >> And it doesn't appear completely unreasonable for
> >> vmx_pi_switch_to() to remove itself (when it gets run with
> >> the "from" hook still NULL and no new device being in the
> >> process of getting assigned).
> >
> > I think this may introduce extra complex to the situation:
> > 1. Especially for "no new device being in the process of getting assigned",
> > since device assignment can be happened simultaneous when this function
> > gets called, so does it mean we need to use a lock to protect it?
> 
> Since device addition/removal is already protected by a lock, this
> would at least seem not impossible to do without causing lock
> conflicts (and would certainly be required, yes).
> 

I use pcidevs_lock()/pcidevs_unlock() in vmx_pi_switch_to() since this lock
is held while device assignment. However, I got the following call trace during
booting the guest. From the message, seems we cannot acquire that lock
in this function.

Back to the original question, maybe it is worth remain the "to" hooks, and
we might go too far if we really want to zap it.

(XEN) CPU:    7
(XEN) RIP:    e008:[<ffff82d0801309a4>] spinlock.c#check_lock+0x3c/0x40
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d1v0)
(XEN) rax: 0000000000000000   rbx: ffff82d0802fea48   rcx: 0000000000000000
(XEN) rdx: 0000000000000001   rsi: 0000000000000003   rdi: ffff82d0802fea4e
(XEN) rbp: ffff8301713efca8   rsp: ffff8301713efca8   r8:  0000000000000003
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: 0000000000000007   r13: ffff83005da89000   r14: ffff83007baf9000
(XEN) r15: 0000000000000003   cr0: 000000008005003b   cr4: 00000000003526e0
(XEN) cr3: 00000001432c6000   cr2: ffff88016a3e6ee8
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0801309a4> (spinlock.c#check_lock+0x3c/0x40):
(XEN)  98 83 f1 01 39 c8 75 02 <0f> 0b 5d c3 55 48 89 e5 f0 ff 05 19 8e 1b 00 5d

(XEN) Xen stack trace from rsp=ffff8301713efca8:
(XEN)    ffff8301713efcc0 ffff82d0801309d3 ffff82d0802fea48 ffff8301713efce0
(XEN)    ffff82d080130c13 ffff83005da89000 ffff8301713f4fe0 ffff8301713efcf0
(XEN)    ffff82d08014da5f ffff8301713efd10 ffff82d0801f24d7 ffff8301713efd40
(XEN)    0000000000000d01 ffff8301713efd50 ffff82d0801f4413 ffff8301713f0068
(XEN)    0000000000000000 ffff83005da89000 0000000000000007 ffff83017e11b000
(XEN)    ffff83007baf9000 ffff8301713efdb0 ffff82d080165ea2 ffff8301713eff18
(XEN)    ffff83017137a000 00000000ffffffff ffff83007baf9060 000000000000004d
(XEN)    ffff83005da89000 ffff83007bada000 ffff83017e11b000 0000000000000007
(XEN)    ffff8301714c3000 ffff8301713efe20 ffff82d080169fb6 ffff82d0801309d3
(XEN)    ffff8301713efde0 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff8301713efe20 ffff83007bada000 ffff83005da89000
(XEN)    0000001531ccf815 ffff8301713f4148 0000000000000001 ffff8301713efeb0
(XEN)    ffff82d08012cf82 b2c99dc400000001 ffff8301713f4160 00000007003efe60
(XEN)    ffff8301713f4140 ffff8301713efe60 0000000700010000 0000000000000001
(XEN)    ffff83005da89000 0000000001c9c380 ffff82d0801bc200 000000007bada000
(XEN)    ffff82d08031ae00 ffff82d08031aa80 ffffffffffffffff ffff8301713effff
(XEN)    ffff83017137a000 ffff8301713efee0 ffff82d080130180 ffff8301713effff
(XEN)    ffff83007baf9000 ffff8301714c3000 00000000ffffffff ffff8301713efef0
(XEN)    ffff82d0801301d5 ffff8301713eff10 ffff82d0801656b2 ffff82d0801301d5
(XEN)    ffff83007bada000 ffff8301713efdc8 0000000000000000 0000000000000000

(XEN) Xen call trace:
(XEN)    [<ffff82d0801309a4>] spinlock.c#check_lock+0x3c/0x40
(XEN)    [<ffff82d0801309d3>] _spin_lock+0x11/0x4f
(XEN)    [<ffff82d080130c13>] _spin_lock_recursive+0x2a/0x56
(XEN)    [<ffff82d08014da5f>] pcidevs_lock+0x10/0x12
(XEN)    [<ffff82d0801f24d7>] vmx.c#vmx_pi_switch_to+0x3f/0x6f
(XEN)    [<ffff82d0801f4413>] vmx.c#vmx_ctxt_switch_to+0x1d8/0x1e5
(XEN)    [<ffff82d080165ea2>] domain.c#__context_switch+0x191/0x3d2
(XEN)    [<ffff82d080169fb6>] context_switch+0x147/0xee7
(XEN)    [<ffff82d08012cf82>] schedule.c#schedule+0x5ae/0x609
(XEN)    [<ffff82d080130180>] softirq.c#__do_softirq+0x7f/0x8a
(XEN)    [<ffff82d0801301d5>] do_softirq+0x13/0x15
(XEN)    [<ffff82d0801656b2>] domain.c#idle_loop+0x55/0x62
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 7:
(XEN) Xen BUG at spinlock.c:48
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Assertion 'current == idle_vcpu[smp_processor_id()]' failed at 
domain.c:2168
(XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    7
(XEN) RIP:    e008:[<ffff82d08016ada9>] __sync_local_execstate+0x44/0x67
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor (d1v0)
(XEN) rax: ffff82d080344290   rbx: 0000000000000002   rcx: 0000000000000007
(XEN) rdx: ffff82d0802e6080   rsi: ffff83005da89000   rdi: ffff83007bada000
(XEN) rbp: ffff8301713ef988   rsp: ffff8301713ef978   r8:  ffff8301714e0000
(XEN) r9:  0000000000000000   r10: 0000000000000007   r11: 0000000000000001
(XEN) r12: 0000000000000001   r13: 00000000000000fd   r14: 0000000000000030
(XEN) r15: 0000000080000000   cr0: 000000008005003b   cr4: 00000000003526e0
(XEN) cr3: 00000001432c6000   cr2: ffff88016a3e6ee8
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d08016ada9> (__sync_local_execstate+0x44/0x67):
(XEN)  8b 3c ca 48 39 fe 74 02 <0f> 0b e8 61 af ff ff 81 e3 00 02 00 00 9c 48 81
(XEN) Xen stack trace from rsp=ffff8301713ef978:
(XEN)    0000000000000100 ffff8301713efa28 ffff8301713ef9a8 ffff82d080194a5c
(XEN)    ffff8301713ef9a8 0000000000000086 ffff8301713efa18 ffff82d080174293
(XEN)    0000000000000046 ffff8301713ef9d8 0000000000000000 0000000000000096
(XEN)    ffff8301713ef9f0 ffff82d0801309d3 ffff82d0802ffc00 0000000000000086
(XEN)    ffff82d08027997d ffff82d080275e11 0000000000000030 ffff82d080279d20
(XEN)    00007cfe8ec105b7 ffff82d080243167 ffff82d080279d20 0000000000000030
(XEN)    ffff82d080275e11 ffff82d08027997d ffff8301713efb18 0000000000000086
(XEN)    0000000000000001 0000000000000007 0000000000000000 ffff8301714e0000
(XEN)    ffff8301713f4028 0000000000000001 ffff8301713effff 0000000000000046
(XEN)    ffff82d0802fe880 000000fd00000000 ffff82d080194593 000000000000e008
(XEN)    0000000000000202 ffff8301713efad8 000000000000e010 ffff82d080194578
(XEN)    ffff8301713efb28 00001388713efae8 000082d08027997d 0000000000000000
(XEN)    0000000000000086 ffff82d08027997d ffff82d080275e11 0000000000000030
(XEN)    ffff8301713efb88 ffff82d080146f25 00000000713efb88 0000000000000020
(XEN)    ffff8301713efb98 ffff8301713efb48 e5894855c35d0b0f ffff82d080279d20
(XEN)    ffff82d080275e11 0000000000000030 ffff8301714e0000 0000000000000002
(XEN)    ffff8301713efbf8 ffff82d080248518 ffff8301713efbe8 ffff82d08019c47d
(XEN)    ffff8301713efbf8 ffff82d0801309a6 0000000000000040 0b0f000000000086
(XEN)    000000fc713efc08 ffff83005da89000 0000000000000007 ffff83005da89000
(XEN)    ffff83007baf9000 0000000000000003 00007cfe8ec103e7 ffff82d080243268

(XEN) Xen call trace:
(XEN)    [<ffff82d08016ada9>] __sync_local_execstate+0x44/0x67
(XEN)    [<ffff82d080194a5c>] invalidate_interrupt+0x40/0x7d
(XEN)    [<ffff82d080174293>] do_IRQ+0x8c/0x60c
(XEN)    [<ffff82d080243167>] common_interrupt+0x67/0x70
(XEN)    [<ffff82d080194593>] machine_restart+0x4a/0x257
(XEN)    [<ffff82d080146f25>] console_suspend+0/0x28
(XEN)    [<ffff82d08019c47d>] do_invalid_op+0x39b/0x4a1
(XEN)    [<ffff82d080243268>] entry.o#handle_exception_saved+0x66/0xa4
(XEN)    [<ffff82d0801309a4>] spinlock.c#check_lock+0x3c/0x40
(XEN)    [<ffff82d0801309d3>] _spin_lock+0x11/0x4f
(XEN)    [<ffff82d080130c13>] _spin_lock_recursive+0x2a/0x56
(XEN)    [<ffff82d08014da5f>] pcidevs_lock+0x10/0x12
(XEN)    [<ffff82d0801f24d7>] vmx.c#vmx_pi_switch_to+0x3f/0x6f
(XEN)    [<ffff82d0801f4413>] vmx.c#vmx_ctxt_switch_to+0x1d8/0x1e5
(XEN)    [<ffff82d080165ea2>] domain.c#__context_switch+0x191/0x3d2
(XEN)    [<ffff82d080169fb6>] context_switch+0x147/0xee7
(XEN)    [<ffff82d08012cf82>] schedule.c#schedule+0x5ae/0x609
(XEN)    [<ffff82d080130180>] softirq.c#__do_softirq+0x7f/0x8a
(XEN)    [<ffff82d0801301d5>] do_softirq+0x13/0x15
(XEN)    [<ffff82d0801656b2>] domain.c#idle_loop+0x55/0x62
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 7:
(XEN) Assertion 'current == idle_vcpu[smp_processor_id()]' failed at 
domain.c:2168
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

Thanks,
Feng

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.