[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen3.3 / Xen3.4 CPU soft lockups under pvops 2.6.31/2.6.32



Pim van Riezen a Ãcrit :
> Good day,
>
> We're trying to get 2.6.31 and 2.6.32 rolled out on our clusters to offer 
> newer features like FUSE fo our customers, but we're ran into a couple of 
> showstopper issues when deploying these kernels on busier guests, showing a 
> lot of errors like this:
>
>   BUG: soft lockup - CPU#0 stuck for 561s! [swapper:0]
>   Modules linked in:
>   CPU 0:
>   Modules linked in:
>   Pid: 0, comm: swapper Not tainted 2.6.32.9xls-domU #2 
>   RIP: e030:[<ffffffff810093aa>]  [<ffffffff810093aa>] 
> hypercall_page+0x3aa/0x1001
>   RSP: e02b:ffffffff81691f70  EFLAGS: 00000246
>   RAX: 0000000000000000 RBX: ffffffff81690000 RCX: ffffffff810093aa
>   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
>   RBP: ffffffff81896d30 R08: 0000000000000000 R09: ffffffff8100e3b2
>   R10: 0000000000000001 R11: 0000000000000246 R12: ffffffffffffffff
>   R13: ffffffff818ebf20 R14: ffffffff818eec70 R15: 0000000000000000
>   FS:  00007f8ac7a9c6e0(0000) GS:ffff8800022ac000(0000) knlGS:0000000000000000
>   CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>   CR2: 00007f291b4c9000 CR3: 000000007d8c1000 CR4: 0000000000002660
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>   Call Trace:
>    [<ffffffff8100ddb7>] ? xen_safe_halt+0xc/0x15
>    [<ffffffff8100bdcf>] ? xen_idle+0x37/0x40
>    [<ffffffff8100fe2e>] ? cpu_idle+0x4f/0x82
>    [<ffffffff818b6c42>] ? start_kernel+0x353/0x35f
>
> in our hope to get rid of this issue we upgraded from Xen 3.3 to Xen 3.4.1.7 
> out of the gitco repos. The issue persisted. Is there a magic version of Xen, 
> preferably one that can be found in an rpm repository for CentOS 5, that 
> *does* properly support pvops kernels without these issues?
>
> I'm also seing this one: 
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=543 where the bug is 
> still open but no activity since 2008. I don't know if that bugzilla is still 
> being actively maintained?
>
> Cheers,
> Pim
>
>
>   
Hi Pim,

I am having similar issues, with differents versions of Xen hypervisor
(3.2, 3.4) and
with domU kernels >= 2.6.26 (until 2.6.32-4 from Debian), always on the
same 2
or 3 VMs that are frequently under heavy load.

After searching a lot, I thought that my CPU softlock problems (which
sometimes
make my VMs freezing) was perhaps related to the xen clocksource, so I
decided
to give a try to this :
   
http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.27

Using jiffies + independant wallclock + ntp in domU seems to have stop
the CPU
softlock error messages in kernel messages (at least I didn't have any
since I use
it, but it's only for 2 days...). Now I am crossing my fingers... :-)

I also read in xen-devel that you are using FC LUNs for storage, I also
use that,
perhaps you will want to have a look at the "Interrupt handling in Xen"
message
that was post on this list yesterday, by defaults my domain-0 was doing
all its
interrupts (network and HBA) on the same CPU, which is probably some kind
of bottleneck under heavy load.

Cheers,

-- 
Yann CÃzard - Administrateur SystÃmes Serveurs
Centre de Ressources Informatiques    -    http://cri.univ-pau.fr
Università de Pau et des Pays de l'Adour - http://www.univ-pau.fr
Bat IFR, rue Jules Ferry, 64000 PAU - TÃl.:  +33 (0)5 59 40 77 94


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.