[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Freeze with 2.6.32.19 and xen-4.0.1rc5



 Am 21.08.2010 16:02, schrieb Pasi Kärkkäinen:
> On Sat, Aug 21, 2010 at 03:47:57PM +0200, Claus Rosenberger wrote:
>>  Hi,
>>
>> i have big trouble with a Debian Lenny dom0 and latest kernel 2.6.32.19
>> with xen-4.0.1rc5. Due some reason the system freezes from time to time.
>> I used kernel 2.6.31.9 with xen-3.4.2 before. The machine doesn't write
>> anything to serial console so there are no errors or something like that.
>>
>> Perhaps there is something to see from the logs ...
>>
> Hello,
>
> A couple of questions:
>
>       - Do you use PCI passthru? 
I tried but now i disabled to avoid a mixup of to many issues.
>       - Is there something special happening when it freezes? 
Last time it happened as creating filesystems, perhaps it's something
about disk usage. At the end of the mail i describe more about the disk
problems.
>       - Does it freeze at regular intervals, at the same time/uptime, or 
> randomly? 
It happens or not, it's randomly.
>       - By freezing you mean it doesn't respond to anything? Or does it 
> reboot?
If it's freezing then i cannot do anything, i can connect with iamt and
reboot, nothing else.
>       - Can you try using the old 2.6.31.9 kernel with the new xen hypervisor?
Sure.
> -- Pasi
>
>
>> Configuration Grub
>>
>> title           Xen 4.0-amd64 / Debian GNU/Linux, kernel 2.6.32.19
>> root            (hd0,0)
>> kernel          /boot/xen-4.0-amd64.gz dom0_mem=524288 cpufreq=xen
>> cpuidle console=com1 com1=115200,8n1,0xf1c0,0 sync_console
> Try adding "loglvl=all guest_loglvl=all" for xen.gz.
Sure.
>> module          /boot/vmlinuz-2.6.32.19 root=/dev/md0 ro console=tty0
>> console=hvc0
>> module          /boot/initrd.img-2.6.32.19
>>
> And try adding "nomodeset" for dom0 kernel (vmlinuz).
Whats that parameter for?

I switched the disk because there was an error on the last one, now on
sata2 there is a brand new disk and i can see following on my console
log. I cannot believe it's a disk problem, perhaps it's a disk
controller problem instead or there is something with the kernel. I will
add the parameters and switch off/on the machine to restart from scratch.

Claus


[17392.097849] sd 1:0:0:0: [sdb] Unhandled error code
[17392.100047] BUG: soft lockup - CPU#0 stuck for 66s! [swapper:0]
[17392.100049] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables
xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop
evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm
serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core
processor button acpi_processor ext3 jbd mbcache dm_mirror
dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif
ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod
e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[17392.100088] CPU 0:
[17392.100089] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables
xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop
evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm
serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core
processor button acpi_processor ext3 jbd mbcache dm_mirror
dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif
ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod
e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[17392.100120] Pid: 0, comm: swapper Not tainted 2.6.32.19 #2
[17392.100122] RIP: e030:[<ffffffff8100928a>]  [<ffffffff8100928a>]
hypercall_page+0x28a/0x1001
[17392.100129] RSP: e02b:ffff880002f38df8  EFLAGS: 00000a07
[17392.100130] RAX: 0000000000000000 RBX: ffffc900081d2060 RCX:
ffffffff8100928a
[17392.100132] RDX: 0000000000000001 RSI: ffffc900081d51c0 RDI:
0000000000000001
[17392.100134] RBP: ffffc900081d2198 R08: 0000000000000000 R09:
0000000000000000
[17392.100135] R10: 0000000000015640 R11: 0000000000000a07 R12:
0000000000000003
[17392.100137] R13: 0000000000004620 R14: 0000000000000021 R15:
6db6db6db6db6db7
[17392.100142] FS:  00007f3b93f6a6e0(0000) GS:ffff880002f35000(0000)
knlGS:0000000000000000
[17392.100144] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[17392.100145] CR2: 00007f3b93f69000 CR3: 000000001efba000 CR4:
0000000000002660
[17392.100147] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[17392.100149] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[17392.100151] Call Trace:
[17392.100153]  <IRQ>  [<ffffffff811fd74a>] ? net_tx_action+0x294/0x9be
[17392.100160]  [<ffffffff8100eadf>] ? xen_restore_fl_direct_end+0x0/0x1
[17392.100164]  [<ffffffff8109786c>] ? check_for_new_grace_period+0x9e/0xa8
[17392.100167]  [<ffffffff81052bc3>] ? tasklet_action+0x77/0xd3
[17392.100170]  [<ffffffff81054352>] ? __do_softirq+0xe0/0x1a2
[17392.100173]  [<ffffffff811ee9ff>] ? __xen_evtchn_do_upcall+0x12a/0x16c
[17392.100176]  [<ffffffff81012bec>] ? call_softirq+0x1c/0x30
[17392.100179]  [<ffffffff81014813>] ? do_softirq+0x3f/0x7c
[17392.100181]  [<ffffffff810541b3>] ? irq_exit+0x36/0x79
[17392.100184]  [<ffffffff811eeeb0>] ? xen_evtchn_do_upcall+0x35/0x42
[17392.100186]  [<ffffffff81012c3e>] ? xen_do_hypervisor_callback+0x1e/0x30
[17392.100187]  <EOI>  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[17392.100191]  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[17392.100194]  [<ffffffff8100e454>] ? xen_safe_halt+0xc/0x15
[17392.100196]  [<ffffffff8100bf15>] ? xen_idle+0x35/0x40
[17392.100199]  [<ffffffff81010c13>] ? cpu_idle+0xa3/0xdd
[17392.100203]  [<ffffffff814f3cdb>] ? start_kernel+0x3da/0x3e5
[17392.100205]  [<ffffffff814f5b83>] ? xen_start_kernel+0x5e6/0x5ea
[17392.103027] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[17392.103031] ata1.00: failed command: READ DMA EXT
[17392.103035] ata1.00: cmd 25/00:00:5d:88:39/00:04:3d:00:00/e0 tag 0
dma 524288 in
[17392.103036]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[17392.103037] ata1.00: status: { DRDY }
[17392.103052] ata1.00: hard resetting link
[17392.372433] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT
[17392.376416] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 3d 39 7f dd 00 04
00 00
[17392.380089] end_request: I/O error, dev sdb, sector 1027178461
[17392.384569] raid1: Disk failure on sdb3, disabling device.
[17392.384569] raid1: Operation continuing on 1 devices.
[17392.389710] BUG: soft lockup - CPU#1 stuck for 66s! [scsi_eh_1:538]
[17392.389710] Modules linked in:
[17392.393352] md: md2: resync done.
[17392.393348]  nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs 8021q garp
bridge stp coretemp lm85 hwmon_vid loop evdev video output tpm_tis tpm
snd_pcsp tpm_bios psmouse snd_pcm serio_raw snd_timer snd soundcore
snd_page_alloc i2c_i801 i2c_core processor button acpi_processor ext3
jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1
md_mod sd_mod crc_t10dif ata_piix ehci_hcd uhci_hcd ata_generic libata
usbcore nls_base scsi_mod e1000e thermal fan thermal_sys [last unloaded:
scsi_wait_scan]
[17392.420099] CPU 1:
[17392.420099] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables
xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop
evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm
serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core
processor button acpi_processor ext3 jbd mbcache dm_mirror
dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif
ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod
e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[17392.448018] Pid: 538, comm: scsi_eh_1 Not tainted 2.6.32.19 #2
[17392.452071] RIP: e030:[<ffffffff8100922a>]  [<ffffffff8100922a>]
hypercall_page+0x22a/0x1001
[17392.456082] RSP: e02b:ffff88000205bbc8  EFLAGS: 00000246
[17392.460069] RAX: 0000000000040000 RBX: ffff880002353000 RCX:
ffffffff8100922a
[17392.464074] RDX: 000000000000d729 RSI: 0000000000000000 RDI:
0000000000000000
[17392.464074] RBP: ffff880002312000 R08: 0000000000000001 R09:
00000000000000fa
[17392.468023] R10: ffff88000206d170 R11: 0000000000000246 R12:
ffff880002338000
[17392.468023] R13: ffff880002353048 R14: ffff88001e7f0900 R15:
ffff880002698000
[17392.468023] FS:  00007f3b93f6a6e0(0000) GS:ffff880002f52000(0000)
knlGS:0000000000000000
[17392.472075] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[17392.472075] CR2: 00007f3b9356d1a4 CR3: 000000001f657000 CR4:
0000000000002660
[17392.472075] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[17392.476070] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[17392.476070] Call Trace:
[17392.476070]  [<ffffffff8100e41d>] ? xen_force_evtchn_callback+0x9/0xa
[17392.476070]  [<ffffffff8100eaf2>] ? check_events+0x12/0x20
[17392.480079] ata1.01: hard resetting link
[17392.480099]  [<ffffffff8100ea99>] ? xen_irq_enable_direct_end+0x0/0x7
[17392.480099]  [<ffffffffa003a19c>] ? scsi_request_fn+0x3b9/0x4da
[scsi_mod]
[17392.480099]  [<ffffffff8117d7b6>] ? __blk_run_queue+0x35/0x66
[17392.484070]  [<ffffffff8117d88d>] ? blk_run_queue+0x20/0x32
[17392.484070]  [<ffffffffa0039826>] ? scsi_run_queue+0x2da/0x370 [scsi_mod]
[17392.488084]  [<ffffffff810e6294>] ? kmem_cache_free+0x71/0xa4
[17392.488084]  [<ffffffffa003a4a5>] ? scsi_next_command+0x2d/0x39
[scsi_mod]
[17392.488084]  [<ffffffffa003adfc>] ? scsi_io_completion+0x1ed/0x416
[scsi_mod]
[17392.488084]  [<ffffffffa0037a7a>] ? scsi_eh_flush_done_q+0xec/0x10d
[scsi_mod]
[17392.488084]  [<ffffffffa00a7223>] ? ata_scsi_error+0x5e9/0x681 [libata]
[17392.488084]  [<ffffffffa0038a3d>] ? scsi_error_handler+0xec/0x5a9
[scsi_mod]
[17392.496345]  [<ffffffffa0038951>] ? scsi_error_handler+0x0/0x5a9
[scsi_mod]
[17392.496345]  [<ffffffff810652ad>] ? kthread+0x75/0x7d
[17392.496345]  [<ffffffff81012aea>] ? child_rip+0xa/0x20
[17392.496345]  [<ffffffff81011ca1>] ? int_ret_from_sys_call+0x7/0x1b
[17392.500072]  [<ffffffff8101245d>] ? retint_restore_args+0x5/0x6
[17392.500072]  [<ffffffff81012ae0>] ? child_rip+0x0/0x20
[17392.956361] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[17392.957486] ata1.01: SATA link down (SStatus 0 SControl 300)
[17392.972769] ata1.00: configured for UDMA/133
[17392.973795] ata1.00: device reported invalid CHS sector 0
[17392.974743] ata1: EH complete
[17393.149020] md: checkpointing resync of md2.
[17393.482545] RAID1 conf printout:
[17393.483122]  --- wd:1 rd:2
[17393.483585]  disk 0, wo:0, o:1, dev:sda3
[17393.484259]  disk 1, wo:1, o:0, dev:sdb3
[17393.492056] RAID1 conf printout:
[17393.492628]  --- wd:1 rd:2
[17393.493108]  disk 0, wo:0, o:1, dev:sda3
[17393.494841] md: resync of RAID array md2
[17393.495559] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[17393.496573] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for resync.
[17393.498235] md: using 128k window, over a total of 958020096 blocks.
[17393.498466] md: resuming resync of md2 from checkpoint.
[17393.498466] md: md2: resync done.
[17393.824165] RAID1 conf printout:
[17393.825572]  --- wd:1 rd:2
[17393.826761]  disk 0, wo:0, o:1, dev:sda3


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.