[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] xen/sched/null: avoid crash after failed domU creation


  • To: Juergen Gross <jgross@xxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 25 Apr 2023 09:42:46 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DdxazYQaXGeErlEztRPxasszU1ZMO5YgXjTge+0NjuI=; b=bCuHrAZzIYBCLKBQ0VZVAng9/nkPFV6dZdAZYripnwjsInJB82MIZHNJnqyfcdZe5ieqhSoUMxg0WQ1NmWf8aOSJ260MLSZeN2xXDjXU0gViyqhuytKkT6kPmTD+oO2usJG2xv5BjeOHwgxEcz5po7uIhfeZ/2ZkNC83NCCKauRmfpLby1Qbgn34BhMF3Kh2xfjW7faV/2M6D9ZMm8mpV5EFfrHeClBJ68erbSCf9ZmL7+YSdHgpi9Eyd7f8RCOYxRmJZvfTCSStIgcmtvaZqP8fbXdS2WAa/m+oba/ONEutgHCm3vpzI/eooY/31Gx0U49SNwdHaQpDn+e4+ujr5Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bFHHp3zFG4JdgOWtNm6Z2cVRaMDg8dij1ZXCxRstriSgziuUDtRyjW3HYDnQ57Qnn6ox7tjGOTBeUYUWsJ14nyLEEJJB4yPG84VSjvUp2Ms3hWTQortCQFseeF2qb53BjJd90ejN7C+ptUNPve3u4aFFKEuVCOooX0LcYnv6xOl4pN6JjliwRyfiVOu0NDn7m8ah6IzJhYFE9jTR2KD9gkEJ/Jv7Ob6Wg1CpGtUHLiK94HG38LSz09Q7k5KxAVqyZ/y1G6N+4hxeBCCDHfC8tXtuoNxvhKS3qnqZ2EbW/igFq+0clB+/tVZ/KuqR0B3YjawoRQxfs72z4HleSqLLCg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 25 Apr 2023 07:42:58 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 25.04.2023 08:36, Juergen Gross wrote:
> On 24.04.23 23:00, Stewart Hildebrand wrote:
>> When creating a domU, but the creation fails, we may end up in a state
>> where a vcpu has not yet been added to the null scheduler, but
>> unit_deassign() is invoked.
> 
> This is not really true. The vcpu has been added, but it was offline at
> that time. This resulted in null_unit_insert() returning early and not
> calling unit_assign().
> 
> Later the vcpu was onlined during XEN_DOMCTL_setvcpucontext handling,
> resulting in null_unit_remove() calling unit_deassign().
> 
>> In this case, when running a debug build of
>> Xen, we will hit an ASSERT and crash Xen:
>>
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
>> (XEN) ****************************************
>>
>> To work around this, remove the ASSERT and introduce a check for the
>> case where npc->unit is NULL and simply return false from
>> unit_deassign().
> 
> I think the correct fix would be to call unit_deassign() from
> null_unit_remove() only, if npc->unit isn't NULL. Dario might have a
> different opinion, though. :-)

Furthermore, even if the proposed solution was (roughly) followed, ...

>> --- a/xen/common/sched/null.c
>> +++ b/xen/common/sched/null.c
>> @@ -376,7 +376,14 @@ static bool unit_deassign(struct null_private *prv, 
>> const struct sched_unit *uni
>>       struct null_pcpu *npc = get_sched_res(cpu)->sched_priv;
>>   
>>       ASSERT(list_empty(&null_unit(unit)->waitq_elem));
>> -    ASSERT(npc->unit == unit);
>> +
>> +    if ( !npc->unit )
>> +    {
>> +        dprintk(XENLOG_G_INFO, "%d <-- NULL (%pdv%d)\n", cpu, unit->domain,
>> +                unit->unit_id);
>> +        return false;
>> +    }
>> +

... shouldn't the assertion be kept, with the new if() inserted ahead of
it? Plus the log message probably better wouldn't print a unit ID like a
vCPU one, but instead use e.g. %pdu%u?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.