[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Revert xen: dont fiddle with event channel masking in suspend/resume



O Fri, Apr 20, 2018 at 07:43:31AM +0200, Juergen Gross wrote:
> On 20/04/18 01:04, Anchal Agarwal wrote:
> > 
> > Hello,
> > 
> > This patch reverts commit e91b2b1194335ca83d8a40fa4e0efd480bf2babe.
> > evtchn are supposed to be masked during resume by irq subsytem 
> > however, they are not. This causes special interrupts like PV 
> > spinlock to cause kernel BUG() as it expects the IRQ to be 
> > masked. This causes instances that are live migrated successfully 
> > to crash after few minutes.
> > 
> > Live Migration uses suspend resume and when xen_irq_resume is invoked, 
> > I saw event channels are not masked. Hence, I reverted this
> > commit to make LM work. Feelings? Recommendations? Things I missed?
> 
> The commit you are reverting was meant to repair suspend/resume handling
> for Xen. Instead of just reverting it the correct thing to do would be
> to find the reason why some event channels are not being masked and
> address that issue.
> 
> See https://lists.xen.org/archives/html/xen-devel/2017-07/msg00898.html
> 
> 
> Juergen
>

Hi Juergen,
The discussion you pointed out suggests to set a flag 
IRQCHIP_MASK_ON_SUSPEND(by tglx@) 
on device irq_chip for non-wakeup interrupts or even a generic flag as 
suggested by you 
on device irq_chip however, in this case I am experiencing issues with spinlock 
ipi handling 
and according to the xen code IPIs are not supposed to be masked during suspend 
resume. 
Hence, even if I set this flag on xen's per_cpu irq_chip, the flag 
IRQF_NO_SUSPEND is being 
set on binding ipi to irq handler in spinlock init code (xen_init_lock_cpu 
->bind_ipi_to_irqhandler)
and dummy handler assigned is throwing out BUG() if it's called at all. Once 
this flag is
set suspend_device_irq will not disable the irq and hence event channel is not 
masked.
Now on xen_irq_resume, during restore_cpu_ipis-> xen_irq_info_ipi_setup, in 
case of 
2 level ABI event channel handling (The issue is on xen 4.2), the port setup is 
a no-op 
however, while using fifo event channel it starts the setup with all event 
channels masked.
Hence if I revert the patch and mask everything in the beginning of 
xen_irq_resume, I don't see the issue.
To avoid this issue I can think of two things:
1. Revert the patch as mentioned before
2. Not to call BUG() in dummy_handler in spinlock code. Rather use 
xen_reschedule_interrupt and not 
the dummy_handler as was changed in commit d5de8841355a4. Moreover, it's not 
very much clear 
from the commit message why it was changed in the first place.

Any thoughts/suggestions?

Thanks,
Anchal
> > 
> > One such stack:
> >  ------------[ cut here ]------------
> >  kernel BUG at arch/x86/xen/spinlock.c:75!
> >  CPU: 0 PID: 675 Comm: kauditd Not tainted 4.14.20-48.30.amzn2.x86_64 #1
> >  Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> >  task: ffff880205eedac0 task.stack: ffffc90000e4c000
> >  RIP: 0010:dummy_handler+0x0/0x10
> >  RSP: 0018:ffff880207203eb8 EFLAGS: 00010046
> >  RAX: ffffffff81027f10 RBX: ffff880206ce1b00 RCX: 0000000000000035
> >  RDX: ffffffff81a81560 RSI: 0000000000000000 RDI: 0000000000000035
> >  RBP: 0000000000000035 R08: ffff880206800248 R09: ffff880206d03600
> >  R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000
> >  R13: ffff880207203f04 R14: 0000000000000000 R15: 0000000000000000
> >  FS:  0000000000000000(0000) GS:ffff880207200000(0000) 
> >  knlGS:0000000000000000
> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >  CR2: 0000561cc8dbd2d0 CR3: 0000000001e0a001 CR4: 00000000001606f0
> >  Call Trace:
> >   <IRQ>
> >   __handle_irq_event_percpu+0x40/0x190
> >   handle_irq_event_percpu+0x30/0x70
> >   handle_percpu_irq+0x37/0x50
> >   generic_handle_irq+0x24/0x30
> >   evtchn_2l_handle_events+0x162/0x280
> >   __xen_evtchn_do_upcall+0x42/0x80
> >   xen_evtchn_do_upcall+0x27/0x40
> >   xen_hvm_callback_vector+0x98/0xa0
> >   </IRQ>
> >  RIP: 0010:finish_task_switch+0x7b/0x200
> >  RSP: 0018:ffffc90000e4fe08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
> >  RAX: 0000000000000001 RBX: ffff880205eedac0 RCX: 0000000000000000
> >  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8802072211c0
> >  RBP: ffffc90000e4fe30 R08: 0000003876060000 R09: 0000000000000000
> >  R10: 0000000000000000 R11: 0000000000000040 R12: ffff8802072211c0
> >  R13: ffffffff81e12480 R14: ffff880202e54c00 R15: 0000000000000000
> >   ? finish_task_switch+0x74/0x200
> >   __schedule+0x29c/0x8a0
> >   ? __wake_up_common_lock+0x89/0xc0
> >   ? kauditd_send_multicast_skb+0x90/0x90
> >   schedule+0x28/0x80
> >   kauditd_thread+0x177/0x220
> >   ? finish_wait+0x80/0x80
> >   ? auditd_reset+0x90/0x90
> >   kthread+0x11a/0x130
> >   ? kthread_create_on_node+0x70/0x70
> >   ? call_usermodehelper_exec_async+0x12a/0x160
> >   ret_from_fork+0x35/0x40
> >   RIP: dummy_handler+0x0/0x10 RSP: ffff880207203eb8
> > 
> > Signed-off-by: Anchal Agarwal <anchalag@xxxxxxxxxx>
> > Signed-off-by: Eduardo Valentin <eduval@xxxxxxxxxx>
> > Reviewed-by: Frank van der Linden <fllinden@xxxxxxxxxx>
> > Reviewed-by: Alakesh Haloi <alakeshh@xxxxxxxxxx>
> > Reviewed-by: Vallish Vaidyeshwara <vallish@xxxxxxxxxx>
> > 
> > ---
> >  drivers/xen/events/events_base.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/xen/events/events_base.c 
> > b/drivers/xen/events/events_base.c
> > index bc03f1a6ad1b..ae71cab207f7 100644
> > --- a/drivers/xen/events/events_base.c
> > +++ b/drivers/xen/events/events_base.c
> > @@ -343,6 +343,14 @@ static void bind_evtchn_to_cpu(unsigned int chn, 
> > unsigned int cpu)
> >     info->cpu = cpu;
> >  }
> >  
> > +static void xen_evtchn_mask_all(void)
> > +{
> > +   unsigned int evtchn;
> > +
> > +   for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
> > +           mask_evtchn(evtchn);
> > +}
> > +
> >  /**
> >   * notify_remote_via_irq - send event to remote end of event channel via 
> > irq
> >   * @irq: irq of event channel to send event to
> > @@ -1565,6 +1573,7 @@ void xen_irq_resume(void)
> >     struct irq_info *info;
> >  
> >     /* New event-channel space is not 'live' yet. */
> > +   xen_evtchn_mask_all();
> >     xen_evtchn_resume();
> >  
> >     /* No IRQ <-> event-channel mappings. */
> > @@ -1682,7 +1691,6 @@ module_param(fifo_events, bool, 0);
> >  void __init xen_init_IRQ(void)
> >  {
> >     int ret = -EINVAL;
> > -   unsigned int evtchn;
> >  
> >     if (fifo_events)
> >             ret = xen_evtchn_fifo_init();
> > @@ -1694,8 +1702,7 @@ void __init xen_init_IRQ(void)
> >     BUG_ON(!evtchn_to_irq);
> >  
> >     /* No event channels are 'live' right now. */
> > -   for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
> > -           mask_evtchn(evtchn);
> > +   xen_evtchn_mask_all();
> >  
> >     pirq_needs_eoi = pirq_needs_eoi_flag;
> >  
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.