[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Juergen, thanks for your explaination. --jyh >-----Original Message----- >From: Juergen Gross [mailto:juergen.gross@xxxxxxxxxxxxxx] >Sent: Friday, April 16, 2010 2:56 PM >To: Jiang, Yunhong >Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke >Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using >tasklets > >Jiang, Yunhong wrote: >> >>> -----Original Message----- >>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] >>> Sent: Thursday, April 15, 2010 7:07 PM >>> To: Jiang, Yunhong; Juergen Gross >>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke >>> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using >>> tasklets >>> >>> On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote: >>> >>>>> Actually that's a good example because it now won't work, but for other >>>>> reasons! The hypercall continuation can interrupt another vcpu's >>>>> execution, >>>>> and then try to synchronously pause that vcpu. Which will deadlock. >>>>> >>>>> Luckily I think we can re-jig this code to freeze_domains() before doing >>>>> the >>>>> continue_hypercall_on_cpu(). I've cc'ed one of the CPU RAS guys. :-) >>>> Hmm, I have cc'ed one of the PM guys because it is enter_state :-) >>>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >>>> cpu_relax for current vcpu in that situation, especially if we are not in >>>> irq >>>> context. >>>> I'm not sure why in freeze_domains it only checkes dom0's vcpu for current, >>>> instead of all domains. >>> Well actually pausing any vcpu from within the hypercall continuation is >>> dangerous. The softirq handler running the hypercall continuation may have >>> interrupted some running VCPU X. And the VCPU Y that the continuation is >>> currently trying to pause may itself be trying to pause X. So we can get a >>> deadlock that way. The freeze_domains() *has* to be pulled outside of the >>> hypercall continuation. >>> >>> It's a little bit similar to the super-subtle stop_machine_run deadlock >>> possibility I just emailed to you a second ago. :-) >> >> Thanks for pointing out the stop_machine_run deadlock issue. >> >> After more consideration and internally discussion, seems the key point is, >> the >tasklet softirq is something like getting a lock for the current vcpu's >state(i.e. no one >else could change that state unless this softirq is finished). So any block >action in >softirq context, not just vcpu_pause_sync, is dangerous and should be avoided >(we >can't get a lock and do block action per my understanding). >> This is because vcpu's state can only be changed by schedule softirq (am I >> right on >this?), while schedule softirq can't prempt other softirq. So, more generally, >anything >that will be updated by a softirq context, and will be syncrhozed/blocking >waitied in >xen's vcpu context is in fact a implicit lock held by the softirq. >> >> To the tricky bug on the stop_machine_run(), I think it is in fact similar >> to the >cpu_add_remove_lock. The stop_machine_run() is a block action, so we must make >sure no one will be blocking to get the lock that is held by stop_machine_run() >already. At that time, we change all components that try to get the >cpu_add_remove_lock to be try_lock. >> >> The changes caused by the tasklet is, a new implicit lock is added, i.e. the >> vcpu's >state. >> The straightforward method is like the cpu_add_remove_lock: make everything >that waiting for the vcpu state change to do softirq between the checking. >Maybe >the cleaner way is your previous suggestion, that is, put the >stop_machine_run() in >the idle_vcpu(), another way is, turn back to the original method, i.e. do it >in the >schedule_tail. >> >> Also We are not sure why the continue_hypercall_on_cpu is changed to use >> tasklet. >What's the benifit for it? At least I think this is quite confusing, because >per our >understanding, usually hypercall is assumed to execut in current context, >while this >change break the assumption. So any hypercall that may use this _c_h_o_c, and >any >function called by that hypercall, should be aware of this, I'm not sure if >this is really >so correct, at least it may cause trouble if someone use this without realize >the >limitation. From Juergen Gross's mail, it seems for cpupool, but I have no >idea of the >cpupool :-( > >Cpupools introduce something like "scheduling domains" in xen. Each cpupool >owns a set of physical cpus and has an own scheduler. Each domain is member >of a cpupool. > >It is possible to move cpus or domains between pools, but a domain is always >limited to the physical cpus being in the cpupool of the domain. > >This limitation makes it impossible to use continue_hypercall_on_cpu with >cpupools for any physical cpu without changing it. My original solution >temporarily moved the target cpu into the cpupool of the issuing domain, >but this was regarded as an ugly hack. > > >Juergen > >-- >Juergen Gross Principal Developer Operating Systems >TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx >Domagkstr. 28 Internet: ts.fujitsu.com >D-80807 Muenchen Company details: >ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |