[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: xen/evtchn: Dom0 boot hangs using preempt_rt kernel 5.10
Hi Juergen, Could you confirm that back porting this two serie to the linux kernel 5.10: https://patchwork.kernel.org/project/xen-devel/cover/20201210192536.118432146@xxxxxxxxxxxxx/ https://patchwork.kernel.org/project/xen-devel/cover/20210306161833.4552-1-jgross@xxxxxxxx/ Is needed to remove the BUG_ON(…)? Thank you for your time. Cheers, Luca > On 18 Mar 2021, at 08:47, Luca Fancellu <Luca.Fancellu@xxxxxxx> wrote: > > Hi Juergen, > > If you are willing to do the patch I think it will be faster to being > accepted, what about the BUG_ON(…) in evtchn_2l_unmask from events_2l.c file? > > Cheers, > > Luca > >> On 18 Mar 2021, at 07:54, Jürgen Groß <jgross@xxxxxxxx> wrote: >> >> On 17.03.21 15:32, Luca Fancellu wrote: >>> Hi all, >>> we've been encountering an issue when using the kernel 5.10 with preempt_rt >>> support for Dom0, the problem is that during the boot of Dom0, it hits a >>> BUG_ON(!irqs_disabled()) from the function evtchn_fifo_unmask defined in >>> events_fifo.c. >>> This is the call stack: >>> [ 17.817018] ------------[ cut here ]------------ >>> [ 17.817021] kernel BUG at drivers/xen/events/events_fifo.c:258! >>> [ 18.817079] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP >>> [ 18.817081] Modules linked in: bridge stp llc ipv6 >>> [ 18.817086] CPU: 3 PID: 558 Comm: xenstored Not tainted >>> 5.10.16-rt25-yocto-preempt-rt #1 >>> [ 18.817089] Hardware name: Arm Neoverse N1 System Development Platform >>> (DT) >>> [ 18.817090] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) >>> [ 18.817092] pc : evtchn_fifo_unmask+0xd4/0xe0 >>> [ 18.817099] lr : xen_irq_lateeoi_locked+0xec/0x200 >>> [ 18.817102] sp : ffff8000123f3cc0 >>> [ 18.817102] x29: ffff8000123f3cc0 x28: ffff0000427b1d80 >>> [ 18.817104] x27: 0000000000000000 x26: 0000000000000000 >>> [ 18.817106] x25: 0000000000000001 x24: 0000000000000001 >>> [ 18.817107] x23: ffff0000412fc900 x22: 0000000000000004 >>> [ 18.817109] x21: 0000000000000000 x20: ffff000042e06990 >>> [ 18.817110] x19: ffff0000427b1d80 x18: 0000000000000010 >>> [ 18.817112] x17: 0000000000000000 x16: 0000000000000000 >>> [ 18.817113] x15: 0000000000000002 x14: 0000000000000001 >>> [ 18.817114] x13: 000000000001a7e8 x12: 0000000000000040 >>> [ 18.817116] x11: ffff000040400248 x10: ffff00004040024a >>> [ 18.817117] x9 : ffff800011be5200 x8 : ffff000040400270 >>> [ 18.817119] x7 : 0000000000000000 x6 : 0000000000000003 >>> [ 18.817120] x5 : 0000000000000000 x4 : ffff000040400308 >>> [ 18.817121] x3 : ffff0000408a400c x2 : 0000000000000000 >>> [ 18.817122] x1 : 0000000000000000 x0 : ffff0000408a4000 >>> [ 18.817124] Call trace: >>> [ 18.817125] evtchn_fifo_unmask+0xd4/0xe0 >>> [ 18.817127] xen_irq_lateeoi_locked+0xec/0x200 >>> [ 18.817129] xen_irq_lateeoi+0x48/0x64 >>> [ 18.817131] evtchn_write+0x124/0x15c >>> [ 18.817134] vfs_write+0xf0/0x2cc >>> [ 18.817137] ksys_write+0xe0/0x100 >>> [ 18.817139] __arm64_sys_write+0x20/0x30 >>> [ 18.817142] el0_svc_common.constprop.0+0x78/0x1a0 >>> [ 18.817145] do_el0_svc+0x24/0x90 >>> [ 18.817147] el0_svc+0x14/0x20 >>> [ 18.817151] el0_sync_handler+0x1a4/0x1b0 >>> [ 18.817153] el0_sync+0x174/0x180 >>> [ 18.817156] Code: 52800120 b90023e6 97e6d104 17fffff0 (d4210000) >>> [ 18.817158] ---[ end trace 0000000000000002 ]--- >>> Our last tested kernel was the 5.4 and our analysis pointed out that the >>> introduction of the lateeoi framework (xen/events: add a new "late EOI" >>> evtchn framework) in conjunction with the preempt_rt patches (irqs kept >>> enabled between spinlock_t/rwlock_t _irqsave///_irqrestore operations) is >>> the root cause. >>> Given that many modifications were made to the mask/unmask operations, a >>> big one from Juergen Gross (xen/events: don't unmask an event channel when >>> an eoi is pending), is the BUG_ON(...) still needed? >>> With the mentioned commit every call to a mask/unmask operation is >>> protected by a spinlock, so I would like to have some feedbacks from who >>> has more experience than me on this part of the code. >> >> I think this BUG_ON() can be removed. >> >> Are you planning to send a patch? >> >> >> Juergen >> <OpenPGP_0xB0DE9DD628BF132F.asc>
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |