[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler


  • To: Juergen Gross <jgross@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Fri, 21 Oct 2022 09:37:17 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lD/kD27MU97TTDhFYxcqdUuePGSrZO2yQ+QnXV941vk=; b=kOP4XkKrtjo2IG4Avagzipsy58yHsLLRfdqXaz5NbwMQDl0SbCyajBGjy+ZllV+QIOA07x/a2YNWZSBc103F/4JQ/TCEnmwGaOMYWNP0cmMJ01OV8YZDnorbFkcpkieVHuHQghDKKoHuBdGuQ0Q8wkkFhtFzmCANvTn2WhM5xGji+XETqmiaJWdGxD4Nj4yMAcYerD2XHtGvBoGOIfkI1q0fflZL1OCLNeX3I1G+9jlSFjtipHm61DZ9pLM7Bj+SoSaWhcLBTsdTZifZ6IRZfHfPJ/Janc+sp+4qlyAqIodsb3DKp7nwB1fMfmdFoKgPnkr+G81ZbjZJUYl6cIdnrg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hYMyHQE+sbMbRVy9yHWyE6/E4GfUggqrYo1pj+H9X39//V6uO3cs/tJNwZVlLeP5AoSuUu5qPg3dpulSqDZ63CCf59oe2LnsC4WydUr4U9k2rOwN1F0ORJT6Yyk0NXThT/mTgLnNthgPKTpIefigtbVlKYUMX3VGcNUD4vFiEQwD1oUk8OHeaX6s24iFD6hFe+QrOEK7+pwpEhFyq51X4/255Gn9GCaxBXJDlrQLNe7vreshMIhqhQnAtkXYXfs48iQyurzG2afSxd3juGQDCfdhTtAQienk5aGDSZoMFVmGYX1ZGFzLA+iZlYRLiga959GwopDRZku24XzInhnzHg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "Henry.Wang@xxxxxxx" <Henry.Wang@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "mengxu@xxxxxxxxxxxxx" <mengxu@xxxxxxxxxxxxx>
  • Delivery-date: Fri, 21 Oct 2022 09:37:21 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY5RPdyv7HUa7jcEWPdT3S+dSn164Yl0EA
  • Thread-topic: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler

Ok, and now, something not really related to the bug being fixed here,
but about the code that is being touched:

On Fri, 2022-10-21 at 08:10 +0200, Juergen Gross wrote:
> diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
> index d6de25531b..960a8033e2 100644
> --- a/xen/common/sched/rt.c
> +++ b/xen/common/sched/rt.c
> @@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct
> sched_unit *currunit,
>          else if ( !unit_runnable_state(snext->unit) )
>          {
>              q_remove(snext);
> +            replq_remove(ops, snext);
>              snext = rt_unit(sched_idle_unit(sched_cpu));
>          }
>  
So, adding a few more context here, the code looks like this:

        snext = runq_pick(ops, cpumask_of(sched_cpu), cur_cpu);

        if ( snext == NULL )
            snext = rt_unit(sched_idle_unit(sched_cpu));
        else if ( !unit_runnable_state(snext->unit) )
        {
            q_remove(snext);
            snext = rt_unit(sched_idle_unit(sched_cpu));
        }

Basically, we've tried to pick-up the highest priority task from the
runqueue. If snext is NULL, the runqueue was just empty, so we pick up
idle (and then, later, we'll check whether the currently running unit
is still runnable; and if it is, we'll continue to run it, of course).

However, it can happen that --e.g., due to core-scheduling-- we picked
up a unit that, despite being in the runqueue, is not runnable. At this
point what we do is removing it from the runqueue (to avoid picking it
up again) and we go for idle.

Now, I may be missing/misremembering something, but it looks to me that
it's possible that there are other runnable units in the runqueue. And
if that's the case, why do we just pick idle and move on, instead of
continuing trying?

Juergen... Am I missing or misremembering any fundamental reason why we
cannot continue to scan the runqueue until the first runnable unit (if
any) is found?

Of course, this is not really related with the bug this patch is
fixing, which is correct and should be applied, no matter what the
outcome of this subthread will be. :-)

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.