[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Dom 0 crash

On 05/11/13 13:16, Jan Beulich wrote:
On 05.11.13 at 12:58, Ian Murray <murrayie@xxxxxxxxxxx> wrote:
I have a recurring crash using Xen 4.3.1-RC2 and Ubuntu 12.04 as Dom0
(3.2.0-55-generic). I have software RAID 5 with LVM's. DomU (also 12.04
Ubuntu 3.2.0-55 kernel) has a dedicated logical volume, which is being backed
up shutting down the DomU, an LVM snapshot being created, restart of DomU and
then the snapshot dd'ed to another logical volume. The snapshot is then
removed and the second LV is dd'ed to gzip and onto DAT tape.

I currently have this running every hour (unless its already running) for
testing purposes. After 6-12 runs of this, the Dom0 kernel crashes with he
below output.

When I preform this booting into the same kernel standalone, the problem
does not occur.
Likely because the action that triggers this doesn't get performed
in that case?
Thanks for the response.

I am obviously comparing apples and oranges, but I have tried to be as similar as possible in as much as I have limited kernel memory to 512M as I do with Dom0 and have used a background task writing /dev/urandom to the LV that the domU would normally be using. The only difference is that it isn't running under Xen and I don't have a domU running in the background. I will repeat the exercise with no domU running, but under Xen.

Can anyone please suggest what I am doing wrong or identify if it is bug?
Considering that exception address ...

RIP: e030:[<ffffffff8142655d>]  [<ffffffff8142655d>] 
... and call stack ...

[24149.786311] Call Trace:
[24149.786315]  <IRQ>
[24149.786323]  [<ffffffff8142da62>] scsi_request_fn+0x3a2/0x470
[24149.786333]  [<ffffffff812f1a28>] blk_run_queue+0x38/0x60
[24149.786339]  [<ffffffff8142c416>] scsi_run_queue+0xd6/0x1b0
[24149.786347]  [<ffffffff8142e822>] scsi_next_command+0x42/0x60
[24149.786354]  [<ffffffff8142ea52>] scsi_io_completion+0x1b2/0x630
[24149.786363]  [<ffffffff816611fe>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[24149.786371]  [<ffffffff81424b5c>] scsi_finish_command+0xcc/0x130
[24149.786378]  [<ffffffff8142e7ae>] scsi_softirq_done+0x13e/0x150
[24149.786386]  [<ffffffff812fb6b3>] blk_done_softirq+0x83/0xa0
[24149.786394]  [<ffffffff8106fa38>] __do_softirq+0xa8/0x210
[24149.786402]  [<ffffffff8166ba6c>] call_softirq+0x1c/0x30
[24149.786410]  [<ffffffff810162f5>] do_softirq+0x65/0xa0
[24149.786416]  [<ffffffff8106fe1e>] irq_exit+0x8e/0xb0
[24149.786428]  [<ffffffff813aecd5>] xen_evtchn_do_upcall+0x35/0x50
[24149.786436]  [<ffffffff8166babe>] xen_do_hypervisor_callback+0x1e/0x30
[24149.786441]  <EOI>
[24149.786449]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[24149.786456]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[24149.786464]  [<ffffffff8100a500>] ? xen_safe_halt+0x10/0x20
[24149.786472]  [<ffffffff8101c913>] ? default_idle+0x53/0x1d0
[24149.786478]  [<ffffffff81013236>] ? cpu_idle+0xd6/0x120
... point into the SCSI subsystem, this is likely the wrong list to
ask for help on.
... but the right list to confirm that I am on the wrong list? :)

Seriously, the specific evidence may suggest it's a non-Xen issue/bug, but Xen is the only measurable/visible difference so far. I referred it to this list because here the demarcation between hypervisor, PVOPS and regular kernel code interaction is likely best understood.

Thanks again for your response.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.