[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lockdep show 6.6-rc regression in Xen HVM CPU hotplug



On Tue, 2023-10-24 at 14:08 +0200, Juergen Gross wrote:
> On 24.10.23 12:41, Juergen Gross wrote:
> > On 24.10.23 09:43, David Woodhouse wrote:
> > > On Tue, 2023-10-24 at 08:53 +0200, Juergen Gross wrote:
> > > > 
> > > > I'm puzzled. This path doesn't contain any of the RCU usage I've added 
> > > > in
> > > > commit 87797fad6cce.
> > > > 
> > > > Are you sure that with just reverting commit 87797fad6cce the issue 
> > > > doesn't
> > > > manifest anymore? I'd rather expect commit 721255b9826b having caused 
> > > > this
> > > > behavior, just telling from the messages above.
> > > 
> > > Retesting in the cold light of day, yes. Using v6.6-rc5 which is the
> > > parent commit of the offending 87797fad6cce.
> > > 
> > > I now see this warning at boot time again, which I believe was an
> > > aspect of what you were trying to fix:
> > > 
> > > [    0.059014] xen:events: Using FIFO-based ABI
> > > [    0.059029] xen:events: Xen HVM callback vector for event delivery is 
> > > enabled
> > > [    0.059227] rcu: srcu_init: Setting srcu_struct sizes based on 
> > > contention.
> > > [    0.059296]
> > > [    0.059297] =============================
> > > [    0.059298] [ BUG: Invalid wait context ]
> > > [    0.059299] 6.6.0-rc5 #1374 Not tainted
> > > [    0.059300] -----------------------------
> > > [    0.059301] swapper/0/0 is trying to lock:
> > > [    0.059303] ffffffff8ad595f8 (evtchn_rwlock){....}-{3:3}, at: 
> > > xen_evtchn_do_upcall+0x59/0xd0
> > 
> > Indeed.
> > 
> > What I still not get is why the rcu_dereference_check() splat isn't
> > happening without my patch.
> > 
> > IMHO it should be related to the fact that cpuhp_report_idle_dead()
> > is trying to send an IPI via xen_send_IPI_one(), which is using
> > notify_remote_via_irq(), which in turn needs to call irq_get_chip_data().
> > This is using the maple-tree since 721255b9826b, which is using
> > rcu_read_lock().
> > 
> > I can probably change xen_send_IPI_one() to not need irq_get_chip_data().
> 
> David, could you test the attached patch, please? Build tested only.

Tested-by: David Woodhouse <dwmw@xxxxxxxxxxxx>

(And I think we worked out the reason why your 'replace evtchn_rwlock
with RCU' patch apparently triggers those other issues is that
*without* your patch, lockdep fired off a warning and then stopped
working long before the other issues occur.)

Attachment: smime.p7s
Description: S/MIME cryptographic signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.