Thanks Jeremy.
Regards to fix you mentioned, did you mean the patch I searched and pasted below, if so, it this all what I need?
For irqbalance disabled, I am afried it might have negative performance impact, right?
-------------------------------------------------------
irq = find_unbound_irq();
set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
- handle_level_irq, "event");
+ handle_edge_irq, "event");
evtchn_to_irq[evtchn] = irq;
irq_info[irq] = mk_evtchn_info(evtchn);
> Date: Tue, 21 Sep 2010 10:28:34 -0700 > From: jeremy@xxxxxxxx > To: keir.fraser@xxxxxxxxxxxxx > CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] Re: VM hung after running sometime > > On 09/21/2010 12:53 AM, Keir Fraser wrote: > > On 21/09/2010 06:02, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > > > >> Take a look at domain 0 event channel with port 105,106, I find on port 105, > >> it pending is > >> 1.(in [1,0], first bit refer to pending, and is 1, second bit refer to mask, > >> is 0). > >> > >> (XEN) 105 [1/0]: s=3 n=2 d=10 p=1 x=0 > >> (XEN) 106 [0/0]: s=3 n=2 d=10 p=2 x=0 > >> > >> In all, we have domain U cpu blocking on _VPF_blocked_in_xen, and it must set > >> the pending bit. > >> Consider pending is 1, i
t looks like the irq is not triggered, am I right ? > >> Since if it is triggerred, it should clear the pending bit. (line 361). > > Yes it looks like dom0 is not handling the event for some reason. Qemu looks > > like it still works and is waiting for a notification via select(). But that > > won't happen until dom0 kernel handles the event as an IRQ and calls the > > relevant irq handler (drivers/xen/evtchn.c:evtchn_interrupt()). > > > > I think you're on the right track in your debugging. I don't know much about > > the pv_ops irq handling path, except to say that this aspect is different > > than non-pv_ops kernels which special-case handling of events bound to > > user-space rather more. So at the moment my best guess would be that the bug > > is in the pv_ops kernel irq handling for this type of user-space-bound > > event. > > We no longer use hand
le_level_irq because there's a race which loses > events when interrupt migration is enabled. Current xen/stable-2.6.32.x > has a proper fix for this, but the quick workaround is to disable > irqbalanced. > > J > > > -- Keir > > > >> ------------------------------/linux-2.6-pvops.git/kernel/irq/chip.c--- > >> 354 void > >> 355 handle_level_irq(unsigned int irq, struct irq_desc *desc) > >> 356 { > >> 357 struct irqaction *action; > >> 358 irqreturn_t action_ret; > >> 359 > >> 360 spin_lock(&desc->lock); > >> 361 mask_ack_irq(desc, irq); > >> 362 > >> 363 if (unlikely(desc->status & IRQ_INPROGRESS)) > >> 364 goto out_unlock; > >> 365 desc->status &= ~(IRQ_REPLAY | IRQ_WAITING); > >> 366 kstat_incr_irqs_this_cpu(irq, desc); > >
> 367 > >> > >> BTW, the qemu still works fine when VM is hang. Below is it strace output. > >> No much difference between other well worked qemu instance, other than select > >> all Timeout. > >> ------------------- > >> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout) > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59535265}) = 0 > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59629728}) = 0 > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59717700}) = 0 > >> clock_gettime(CLOCK_MONOTONIC, {673470, 59806552}) = 0 > >> select(14, [3 7 11 12 13], [], [], {0, 10000}) = 0 (Timeout) > >> clock_gettime(CLOCK_MONOTONIC, {673470, 70234406}) = 0 > >> clock_gettime(CLOCK_MONOTONIC, {673470, 70332116}) = 0 > >> clock_gettime(CLOCK_MONOTONIC, {673470, 70419835}) = 0 > >> > >> > >
> > >> > >>> Date: Mon, 20 Sep 2010 10:35:46 +0100 > >>> Subject: Re: VM hung after running sometime > >>> From: keir.fraser@xxxxxxxxxxxxx > >>> To: tinnycloud@xxxxxxxxxxx > >>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx > >>> > >>> On 20/09/2010 10:15, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > >>> > >>>> Thanks Keir. > >>>> > >>>> You're right, after I deeply looked into the wait_on_xen_event_channel, it > >>>> is > >>>> smart enough > >>>> to avoid the race I assumed. > >>>> > >>>> How about prepare_wait_on_xen_event_channel ? > >>>> Currently Istill don't know when it will be invoked. > >>>> Could enlighten me? > >>> As you
can see it is called from hvm_send_assist_req(), hence it is called > >>> whenever an ioreq is sent to qemu-dm. Note that it is called *before* > >>> qemu-dm is notified -- hence it cannot race the wakeup from qemu, as we will > >>> not get woken until qemu-dm has done the work, and it cannot start the work > >>> until it is notified, and it is not notified until after > >>> prepare_wait_on_xen_event_channel has been executed. > >>> > >>> -- Keir > >>> > >>>>> Date: Mon, 20 Sep 2010 08:45:21 +0100 > >>>>> Subject: Re: VM hung after running sometime > >>>>> From: keir.fraser@xxxxxxxxxxxxx > >>>>> To: tinnycloud@xxxxxxxxxxx > >>>>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jbeulich@xxxxxxxxxx > >>>>> > >>>>> On 20/09/2010
07:00, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote: > >>>>> > >>>>>> When IO is not ready, domain U in VMEXIT->hvm_do_resume might invoke > >>>>>> wait_on_xen_event_channel > >>>>>> (where it is blocked in _VPF_blocked_in_xen). > >>>>>> > >>>>>> Here is my assumption of event missed. > >>>>>> > >>>>>> step 1: hvm_do_resume execute 260, and suppose p->state is > >>>>>> STATE_IOREQ_READY > >>>>>> or STATE_IOREQ_INPROCESS > >>>>>> step 2: then in cpu_handle_ioreq is in line 547, it execute line 548 so > >>>>>> quickly before hvm_do_resume execute line 270. > >>>>>> Well, the event is missed. > >>>>>> In other words, the _VPF_blocked_in_xen
is cleared before it is actually > >>>>>> setted, and Domian U who is blocked > >>>>>> might never get unblocked, it this possible? > >>>>> Firstly, that code is very paranoid and it should never actually be the > >>>>> case > >>>>> that we see STATE_IOREQ_READY or STATE_IOREQ_INPROCESS in hvm_do_resume(). > >>>>> Secondly, even if you do, take a look at the implementation of > >>>>> wait_on_xen_event_channel() -- it is smart enough to avoid the race you > >>>>> mention. > >>>>> > >>>>> -- Keir > >>>>> > >>>>> > >>> > >> > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > >
; http://lists.xensource.com/xen-devel > > >
|