[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] xen/sched: fix cpu offlining with core scheduling



On Tue, 2020-03-10 at 09:09 +0100, Juergen Gross wrote:
> Offlining a cpu with core scheduling active can result in a hanging
> system. Reason is the scheduling resource and unit of the to be
> removed
> cpus needs to be split in order to remove the cpu from its cpupool
> and
> move it to the idle scheduler. In case one of the involved cpus
> happens
> to have received a sched slave event due to a vcpu former having been
> running on that cpu being woken up again, it can happen that this cpu
> will enter sched_wait_rendezvous_in() while its scheduling resource
> is
> just about to be split. It might wait for ever for the other sibling
> to join, which will never happen due to the resources already being
> modified.
> 
> This can easily be avoided by:
> - resetting the rendezvous counters of the idle unit which is kept
> - checking for a new scheduling resource in
> sched_wait_rendezvous_in()
>   after reacquiring the scheduling lock and resetting the counters in
>   that case without scheduling another vcpu
> - moving schedule resource modifications (in schedule_cpu_rm()) and
>   retrieving (schedule(), sched_slave() is fine already, others are
> not
>   critical) into locked regions
> 
> Reported-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
>
Reviewed-by: Dario Faggioli <dfaggioli@xxxxxxxx>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.