[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] deadlock in the credit2



On 14/10/2011 12:47, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote:

> 2011/10/14 Eunbyung Park <silverbottlep@xxxxxxxxx>:
>> IMHO, it seems to be deadlock when changing dom0's weight in credit2
>> scheduler.
>> 
>> when the sched_adjust() in schedule.c is called, it grabs the
>> schedule_lock after pausing all of the vcpus
>> 
>> and then, csched_dom_cntl in sched_credit2.c, it also grab the
>> schedule_lock by using vcpu_schedule_lock_irq().
>> 
>> In the credit2, all of the percpu schedule_lock points out same runqueue
>> lock if they belong to same runqueue.
>> 
>> Eventually, all of vcpu are paused except for itself running the code,
>> and it try to grab schedule_lock that was grabbed by itself.
>> 
>> Am I right? If I was wrong, please tell me my misunderstanding.
> 
> Hmm, I think you may have discovered the source of a bug that people
> have been reporting but I haven't had time to look into yet.
> 
> Keir, I think that lock in schedule.c around SCHED(adjust) must be
> wrong.  By the time we grab that lock, grabbing it will be completely
> pointless.  What are we going to be racing against?  In any case, the
> actual scheduler should be responsible for grabbing locks; there's no
> reason that the scheduler can't grab whatever lock it needs inside
> that function.  I haven't done a deep analysis, but my initial
> instinct is to just get rid of it.  What do you think?

Fine by me. The synchronisation in that function looks pretty fragile. It's
probably outdated too.

 -- Keir

>> if ( d == current->domain )
>> vcpu_schedule_lock_irq(current);
>> 
>> It was very hard to understan for me..:) What does it exactly mean?
> 
> You're asking what "current" means?  "current" is a macro that always
> resolves to the vcpu which is running on the current processor.
> 
> sched_adjust() seems to be trying to avoid scheduling races in general
> by pausing all vcpus before calling the per-scheduler function.  But
> if a VM is calling the op on itself, the vcpu making the hypercall
> can't pause itself.  So in that case (current->domain == d) will be
> true, so sched_adjust() grab the schedule lock of that vm instead.
> 
> But really all that locking should be handled in the scheduler
> function, not by the generic code.  It knows best what needs to be
> locked when.
> 
>  -George



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.