[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Revert xen: dont fiddle with event channel masking in suspend/resume
O Fri, Apr 20, 2018 at 07:43:31AM +0200, Juergen Gross wrote: > On 20/04/18 01:04, Anchal Agarwal wrote: > > > > Hello, > > > > This patch reverts commit e91b2b1194335ca83d8a40fa4e0efd480bf2babe. > > evtchn are supposed to be masked during resume by irq subsytem > > however, they are not. This causes special interrupts like PV > > spinlock to cause kernel BUG() as it expects the IRQ to be > > masked. This causes instances that are live migrated successfully > > to crash after few minutes. > > > > Live Migration uses suspend resume and when xen_irq_resume is invoked, > > I saw event channels are not masked. Hence, I reverted this > > commit to make LM work. Feelings? Recommendations? Things I missed? > > The commit you are reverting was meant to repair suspend/resume handling > for Xen. Instead of just reverting it the correct thing to do would be > to find the reason why some event channels are not being masked and > address that issue. > > See https://lists.xen.org/archives/html/xen-devel/2017-07/msg00898.html > > > Juergen > Hi Juergen, The discussion you pointed out suggests to set a flag IRQCHIP_MASK_ON_SUSPEND(by tglx@) on device irq_chip for non-wakeup interrupts or even a generic flag as suggested by you on device irq_chip however, in this case I am experiencing issues with spinlock ipi handling and according to the xen code IPIs are not supposed to be masked during suspend resume. Hence, even if I set this flag on xen's per_cpu irq_chip, the flag IRQF_NO_SUSPEND is being set on binding ipi to irq handler in spinlock init code (xen_init_lock_cpu ->bind_ipi_to_irqhandler) and dummy handler assigned is throwing out BUG() if it's called at all. Once this flag is set suspend_device_irq will not disable the irq and hence event channel is not masked. Now on xen_irq_resume, during restore_cpu_ipis-> xen_irq_info_ipi_setup, in case of 2 level ABI event channel handling (The issue is on xen 4.2), the port setup is a no-op however, while using fifo event channel it starts the setup with all event channels masked. Hence if I revert the patch and mask everything in the beginning of xen_irq_resume, I don't see the issue. To avoid this issue I can think of two things: 1. Revert the patch as mentioned before 2. Not to call BUG() in dummy_handler in spinlock code. Rather use xen_reschedule_interrupt and not the dummy_handler as was changed in commit d5de8841355a4. Moreover, it's not very much clear from the commit message why it was changed in the first place. Any thoughts/suggestions? Thanks, Anchal > > > > One such stack: > > ------------[ cut here ]------------ > > kernel BUG at arch/x86/xen/spinlock.c:75! > > CPU: 0 PID: 675 Comm: kauditd Not tainted 4.14.20-48.30.amzn2.x86_64 #1 > > Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 > > task: ffff880205eedac0 task.stack: ffffc90000e4c000 > > RIP: 0010:dummy_handler+0x0/0x10 > > RSP: 0018:ffff880207203eb8 EFLAGS: 00010046 > > RAX: ffffffff81027f10 RBX: ffff880206ce1b00 RCX: 0000000000000035 > > RDX: ffffffff81a81560 RSI: 0000000000000000 RDI: 0000000000000035 > > RBP: 0000000000000035 R08: ffff880206800248 R09: ffff880206d03600 > > R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000 > > R13: ffff880207203f04 R14: 0000000000000000 R15: 0000000000000000 > > FS: 0000000000000000(0000) GS:ffff880207200000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000561cc8dbd2d0 CR3: 0000000001e0a001 CR4: 00000000001606f0 > > Call Trace: > > <IRQ> > > __handle_irq_event_percpu+0x40/0x190 > > handle_irq_event_percpu+0x30/0x70 > > handle_percpu_irq+0x37/0x50 > > generic_handle_irq+0x24/0x30 > > evtchn_2l_handle_events+0x162/0x280 > > __xen_evtchn_do_upcall+0x42/0x80 > > xen_evtchn_do_upcall+0x27/0x40 > > xen_hvm_callback_vector+0x98/0xa0 > > </IRQ> > > RIP: 0010:finish_task_switch+0x7b/0x200 > > RSP: 0018:ffffc90000e4fe08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c > > RAX: 0000000000000001 RBX: ffff880205eedac0 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8802072211c0 > > RBP: ffffc90000e4fe30 R08: 0000003876060000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000040 R12: ffff8802072211c0 > > R13: ffffffff81e12480 R14: ffff880202e54c00 R15: 0000000000000000 > > ? finish_task_switch+0x74/0x200 > > __schedule+0x29c/0x8a0 > > ? __wake_up_common_lock+0x89/0xc0 > > ? kauditd_send_multicast_skb+0x90/0x90 > > schedule+0x28/0x80 > > kauditd_thread+0x177/0x220 > > ? finish_wait+0x80/0x80 > > ? auditd_reset+0x90/0x90 > > kthread+0x11a/0x130 > > ? kthread_create_on_node+0x70/0x70 > > ? call_usermodehelper_exec_async+0x12a/0x160 > > ret_from_fork+0x35/0x40 > > RIP: dummy_handler+0x0/0x10 RSP: ffff880207203eb8 > > > > Signed-off-by: Anchal Agarwal <anchalag@xxxxxxxxxx> > > Signed-off-by: Eduardo Valentin <eduval@xxxxxxxxxx> > > Reviewed-by: Frank van der Linden <fllinden@xxxxxxxxxx> > > Reviewed-by: Alakesh Haloi <alakeshh@xxxxxxxxxx> > > Reviewed-by: Vallish Vaidyeshwara <vallish@xxxxxxxxxx> > > > > --- > > drivers/xen/events/events_base.c | 13 ++++++++++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/xen/events/events_base.c > > b/drivers/xen/events/events_base.c > > index bc03f1a6ad1b..ae71cab207f7 100644 > > --- a/drivers/xen/events/events_base.c > > +++ b/drivers/xen/events/events_base.c > > @@ -343,6 +343,14 @@ static void bind_evtchn_to_cpu(unsigned int chn, > > unsigned int cpu) > > info->cpu = cpu; > > } > > > > +static void xen_evtchn_mask_all(void) > > +{ > > + unsigned int evtchn; > > + > > + for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++) > > + mask_evtchn(evtchn); > > +} > > + > > /** > > * notify_remote_via_irq - send event to remote end of event channel via > > irq > > * @irq: irq of event channel to send event to > > @@ -1565,6 +1573,7 @@ void xen_irq_resume(void) > > struct irq_info *info; > > > > /* New event-channel space is not 'live' yet. */ > > + xen_evtchn_mask_all(); > > xen_evtchn_resume(); > > > > /* No IRQ <-> event-channel mappings. */ > > @@ -1682,7 +1691,6 @@ module_param(fifo_events, bool, 0); > > void __init xen_init_IRQ(void) > > { > > int ret = -EINVAL; > > - unsigned int evtchn; > > > > if (fifo_events) > > ret = xen_evtchn_fifo_init(); > > @@ -1694,8 +1702,7 @@ void __init xen_init_IRQ(void) > > BUG_ON(!evtchn_to_irq); > > > > /* No event channels are 'live' right now. */ > > - for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++) > > - mask_evtchn(evtchn); > > + xen_evtchn_mask_all(); > > > > pirq_needs_eoi = pirq_needs_eoi_flag; > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |