Xen project Mailing List

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>

From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>

Date: Fri, 16 Apr 2010 14:45:53 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Yu, Ke" <ke.yu@xxxxxxxxx>

Delivery-date: Thu, 15 Apr 2010 23:46:53 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcrccV74GaC9GZ9lTbWk9MTZP2+EwQAAhsPFAABQK/sAAk0GUAADclFDACFVn6AAB8ufMA==

Thread-topic: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

BTW, I suspect if cpu online/offline lock should really wrap the stop_machine_run(). I remember Criping questioned this also. Will have a look on it. --jyh >-----Original Message----- >From: Jiang, Yunhong >Sent: Friday, April 16, 2010 2:43 PM >To: Keir Fraser; Juergen Gross >Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke >Subject: RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using >tasklets > > > >>-----Original Message----- >>From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] >>Sent: Thursday, April 15, 2010 7:07 PM >>To: Jiang, Yunhong; Juergen Gross >>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke >>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using >>tasklets >> >>On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote: >> >>>> Actually that's a good example because it now won't work, but for other >>>> reasons! The hypercall continuation can interrupt another vcpu's execution, >>>> and then try to synchronously pause that vcpu. Which will deadlock. >>>> >>>> Luckily I think we can re-jig this code to freeze_domains() before doing >>>> the >>>> continue_hypercall_on_cpu(). I've cc'ed one of the CPU RAS guys. :-) >>> >>> Hmm, I have cc'ed one of the PM guys because it is enter_state :-) >>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >>> cpu_relax for current vcpu in that situation, especially if we are not in >>> irq >>> context. >>> I'm not sure why in freeze_domains it only checkes dom0's vcpu for current, >>> instead of all domains. >> >>Well actually pausing any vcpu from within the hypercall continuation is >>dangerous. The softirq handler running the hypercall continuation may have >>interrupted some running VCPU X. And the VCPU Y that the continuation is >>currently trying to pause may itself be trying to pause X. So we can get a >>deadlock that way. The freeze_domains() *has* to be pulled outside of the >>hypercall continuation. >> >>It's a little bit similar to the super-subtle stop_machine_run deadlock >>possibility I just emailed to you a second ago. :-) > >Thanks for pointing out the stop_machine_run deadlock issue. > >After more consideration and internally discussion, seems the key point is, the >tasklet softirq is something like getting a lock for the current vcpu's >state(i.e. no one >else could change that state unless this softirq is finished). So any block >action in >softirq context, not just vcpu_pause_sync, is dangerous and should be avoided >(we >can't get a lock and do block action per my understanding). >This is because vcpu's state can only be changed by schedule softirq (am I >right on >this?), while schedule softirq can't prempt other softirq. So, more generally, >anything >that will be updated by a softirq context, and will be syncrhozed/blocking >waitied in >xen's vcpu context is in fact a implicit lock held by the softirq. > >To the tricky bug on the stop_machine_run(), I think it is in fact similar to >the >cpu_add_remove_lock. The stop_machine_run() is a block action, so we must make >sure no one will be blocking to get the lock that is held by stop_machine_run() >already. At that time, we change all components that try to get the >cpu_add_remove_lock to be try_lock. > >The changes caused by the tasklet is, a new implicit lock is added, i.e. the >vcpu's >state. >The straightforward method is like the cpu_add_remove_lock: make everything >that >waiting for the vcpu state change to do softirq between the checking. Maybe the >cleaner way is your previous suggestion, that is, put the stop_machine_run() >in the >idle_vcpu(), another way is, turn back to the original method, i.e. do it in >the >schedule_tail. > >Also We are not sure why the continue_hypercall_on_cpu is changed to use >tasklet. >What's the benifit for it? At least I think this is quite confusing, because >per our >understanding, usually hypercall is assumed to execut in current context, >while this >change break the assumption. So any hypercall that may use this _c_h_o_c, and >any >function called by that hypercall, should be aware of this, I'm not sure if >this is really >so correct, at least it may cause trouble if someone use this without realize >the >limitation. From Juergen Gross's mail, it seems for cpupool, but I have no >idea of the >cpupool :-( > >--jyh > > >> >> -- Keir >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.