Xen project Mailing List

RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS

To: <keir@xxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>

Date: Sat, 16 Oct 2010 13:39:43 +0800

Cc:

Delivery-date: Fri, 15 Oct 2010 22:40:37 -0700

Importance: Normal

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Well, Thanks Keir.
Fortunately we caught the bug, it turned out to be a tapdisk problem.
A brief explaination for other guys might confront this issue.

Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent access of
tap->deferred_queue between line 24 and 37, which will finally cause bad
pointer of tap->deferred_queue, and infinte loop in while clause in line 22.
Lock line 24 will be a simple fix.

/linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c
9 void
10 blktap_run_deferred(void)
11 {
12     LIST_HEAD(queue);
13     struct blktap *tap;
14     unsigned long flags;
15
16     spin_lock_irqsave(&deferred_work_lock, flags);
17     list_splice_init(&deferred_work_queue, &queue);
18     list_for_each_entry(tap, &queue, deferred_queue)
19         clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
20     spin_unlock_irqrestore(&deferred_work_lock, flags);
21
22     while (!list_empty(&queue)) {
23         tap = list_entry(queue.next, struct blktap, deferred_queue);
24 &nb sp;       list_del_init(&tap->deferred_queue);
25         blktap_device_restart(tap);
26     }
27 }
28
29 void
30 blktap_defer(struct blktap *tap)
31 {
32     unsigned long flags;
33
34     spin_lock_irqsave(&deferred_work_lock, flags);
35     if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) {
36         set_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
37         list_add_tail(&tap->deferred_queue, &deferred_work_queue);
38     }
39     spin_unlock_irqrestore(&deferred_work_lock, f lags);
40 }

> Date: Fri, 15 Oct 2010 13:57:09 +0100
> Subject: Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS
> From: keir@xxxxxxx
> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>
> You'll probably want to see if you can get SysRq output from dom0 via serial
> line. It's likely you can if it is alive enough to respond to ping. This
> might tell you things like what all processes are getting blocked on, and
> thus indicate what is stopping dom0 from making progress.
>
> -- Keir
>
> On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>
> >
> > Hi Keir:
> >
> > First, I'd like to express my appreciation for the help your offered
> > before.
> > Well, recently we confront a rather nasty domain 0 no response
> > problem.
> >
> > We still have 12 HVMs almost continuously and con currently reboot
> > test on a physical server.
> > A few hours later, the server looks like dead. We only can ping to
> > the server and get right response,
> > the Xen works fine since we can get debug info from serial port. Attached is
> > the full debug output.
> > After decode the domain 0 CPU stack, I find the CPU still works for domain 0
> > since the stack changed
> > info changed every time I dumped.
> >
> > Could help to take a look at the attentchment to see whether there are
> > some hints for debugging this
> > problem. Thanks in advance.
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
>
>

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.