[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen/evtchn: Dom0 boot hangs using preempt_rt kernel 5.10


  • To: Julien Grall <julien@xxxxxxx>
  • From: Luca Fancellu <luca.fancellu@xxxxxxx>
  • Date: Wed, 24 Mar 2021 10:37:39 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=t56DnTJw6DzNJnAsMXt2TX6uT1vbJ+Bkkvpe/sYC3KQ=; b=GNGd7F/r7HJgfAaDLKec3mAf3zZxZGYjxUSHtGfSJXGElYG0P0uTUO9tgeB0RCDAvaz5iw71maLs3mI6t1vlq+zxA5AZBykOLBP0Se1/ZTTNHEOGRj65CPPpnvJpNgKPH5oxqjiY1KiEljhAUjIHJLLoFUEpg65jo7mscCXyyqEEONUzhjDbEJeRzGpb4DbgDXT8mRxa+Qw8Z5W1cXFo4Qis4qvO8JZeaZuOqI4LWxx4a/Nlt1fno6e9qdXo5RTllrAQ2tlIIOXKLT366ekgwh/iBmZb0B6ogIlN9c9AG+7FzOYzw3cy4xJfY9NHzSgdXM26yJjJHbPi00ghyheGsw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NqHOwjYzS5rKon78kXWLG1jJwMHt21s2VVcTxSGbezfvIlJEPAaMgFdPWjnSUWXetdowbU0FLhHO44CwmPrndVeF0VuaXCxi2kau6ZxZjpK209lhv4zlUzozMk0LbEiW8zgCnctfpBpCEP/G+b9832BWE58V6Dg9oPkrifCS6D2drP/QkuutsUgdPBtn5qHT5KYj+QJA9PL8jVGTU4Z3T3qGHPPDR414wtgnGYS7PiT3zUkMOtUXJRUkOQcD3K9QGLa7YuwgoNg8XDf9WE4KSSnXH0/fjq3NwpfNQMEe77yG+KUugtoMNpZzY8j+CCSXBzq4XC5CnfJ3TQqAWBU56A==
  • Authentication-results-original: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Cc: Jason Andryuk <jandryuk@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jürgen Groß <jgross@xxxxxxxx>
  • Delivery-date: Wed, 24 Mar 2021 10:38:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;


> On 23 Mar 2021, at 19:26, Julien Grall <julien@xxxxxxx> wrote:
> 
> 
> 
> On 23/03/2021 17:06, Luca Fancellu wrote:
>> Hi all,
> 
> Hi,
> 
> Please avoid top posting when answering to a comment. This makes more 
> difficult to follow.
> 
>> I have an update, changing the lock introduced by the serie from spinlock_t 
>> to raw_spinlock_t, changing the lock/unlock function to use the raw_* 
>> version and keeping the BUG_ON(…) (now we can because raw_* implementation 
>> disable interrupts on preempt_rt) the kernel is booting correctly.
>> So seems that the BUG_ON(…) is needed and the unmask function should run 
>> with interrupt disabled, anyone knows why this change worked?
> 
> Do you mean why no-one spotted the issue before? If so, AFAIK, on vanilla 
> Linux, spin_lock is still just a wrapper to raw_spinlock. IOW there is no 
> option to replace it with a RT spinlock.
> 
> So if you don't apply the RT patches, you would not be able to trigger the 
> issue.
> 
> As to the fix itself, I think using raw_spinlock_t is the correct thing to do 
> because the lock is also used in interrupt context (even with RT enabled).
> 
> Would you be able to send a patch?

Yes I’ll send a patch soon

> 
>>> On 23 Mar 2021, at 15:39, Luca Fancellu <luca.fancellu@xxxxxxx> wrote:
>>> 
>>> Hi Jason,
>>> 
>>> Thanks for your hints, unfortunately seems not an init problem because in 
>>> the same init configuration I tried the 5.10.23 (preempt_rt) without the 
>>> Juergen patch but with the BUG_ON removed and it boots without problem. So 
>>> seems that applying the serie does something (on a preempt_rt kernel) and 
>>> we are trying to figure out what.
>>> 
>>> 
>>>> On 23 Mar 2021, at 12:36, Jason Andryuk <jandryuk@xxxxxxxxx> wrote:
>>>> 
>>>> On Mon, Mar 22, 2021 at 3:09 PM Luca Fancellu <luca.fancellu@xxxxxxx> 
>>>> wrote:
>>>>> 
>>>>> Hi Juergen,
>>>>> 
>>>>> Yes you are right it was my mistake, as you said to remove the BUG_ON(…) 
>>>>> this serie 
>>>>> (https://patchwork.kernel.org/project/xen-devel/cover/20210306161833.4552-1-jgross@xxxxxxxx/)
>>>>>  is needed, since I’m using yocto I’m able to build a preempt_rt kernel 
>>>>> up to the 5.10.23 and for this reason I’m applying that serie on top of 
>>>>> this version, then I’m removing the BUG_ON(…).
>>>>> 
>>>>> A thing that was not expected is that now the Dom0 kernel is stuck on 
>>>>> “Setting domain 0 name, domid and JSON config…” step and the system seems 
>>>>> unresponsive. Seems like a deadlock issue but looking into the serie we 
>>>>> can’t spot anything and that serie was also tested by others from the 
>>>>> community.
> 
> The deadlock is expected. When you enable RT spinlock, the interrupts will 
> not disabled even when you call spin_lock_irqsave().
> 
> As the lock is also used in interrupt context (e.g. with interrupt masked), 
> this will lead to a deadlock because the lock can be held with interrupt 
> unmasked.
> 
> This is quite a common error as developpers are not yet used to test RT. I 
> remember finding a few other instances like that when I worked on RT a couple 
> of years ago.
> 
> For future reference, I think CONFIG_PROVE_LOCKING=y could help you to detect 
> (potential) deadlock.
> 
> Cheers,
> 
> -- 
> Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.