[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [Xen-users] old issue after 1024 live migrations seems to still exist.


  • To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
  • From: Florian Heigl <florian.heigl@xxxxxxxxx>
  • Date: Fri, 23 Jul 2010 13:32:33 +0200
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 23 Jul 2010 04:33:32 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=CyWNkJsdMSAU27Um/V5ehT5eh0Nzr+ZEuyI+SUGA4H3KODo6if+Tkzorgcfas6C6JR xx5u2DhqeYAe8fA4YBVw2ZRcA26Sy9jH36J5VWiNFreJwtRJWz3tgYgF3bPTi5y6BDWF kYtDfOD+0bxg+9/ZzpQWQE3o+8Ru/fQlEHNFg=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Ian & list,

I'll provide the specifics of my config, sure






2010/7/22 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> (dropping xen-users to avoid cross-posting)

> Do you have a reference to this old issue?

I googled for the old mailing list post, but no luck with the traffic
on the Xen lists.
Firstofall, I'm glad if it's a different bug and doesn't exist for
most people :)

> To be honest I think it is unlikely that you are seeing the actual same
> issue as a bug that old, even if your symptoms are very similar.
>
> Can you give details of your precise system configuration for both host
> and guest, hypervisor changeset (I don't know what Oracle VM 2.0 has in
> it), kernel changeset for both dom0 and domU etc.

dom0 (both identical)
xen_major              : 3
xen_minor              : 4
xen_extra              : .0
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xff400000
xen_changeset          : unavailable


[root@waxh0004 ~]# uname -a
Linux waxh0004 2.6.18-128.2.1.4.9.el5xen #1 SMP Fri Oct 9 14:57:31 EDT
2009 i686 i686 i386 GNU/Linux

domU:
debian:~# uname -a
Linux debian 2.6.26-2-xen-686 #1 SMP Wed Nov 4 23:23:33 UTC 2009 i686 GNU/Linux

(debian lenny from stacklet.com, kernel date was nov9 09)

> I am currently doing some live migration testing with guests under load
> (forkbomb) and am regularly doing 4-5000 successful migrations before I
> hit a very subtle deadlock in a PVops domU kernel. I have most likely in
> the past 4-5 years personally done tens of thousands of iterations of
> live migration in various scenarios and we know other people are
> regularly doing automated and manual test of these things so the problem
> you are seeing is almost certainly not a generic failure but must be
> specific to the version of one or more components in your system.

good!

> Are you seeing failure after precisely 1024 migrations in every case or
> is that just a rough figure? It might be worth

no, it was more like "just above 1000", I also had some counter
problem in the script.
Note that before that a few times the migration ended with a domU was
down. so your below hint / leak might just be the thing.

> using /usr/lib/xen/bin/lsevtchn to check what is happening to both the
> dom0 and domU event channels after each migration iteration. Once upon a

okay, will log that

> time I was seeing an evtchn leak in domU (now fixed) but that wouldn't
> fail after precisely 1024 iterations since there is always a number of
> non-leaking event channels also in use.
>
> Are you able to test with an up to date xen-3.4-testing or even better
> the xen-4.0-testing tree?

Retesting with Xen 4 would be a bit tricky. Oracle has an SDK domU
that has all the dom0 sources, would still take a day of work I'm
afraid.

I'd hope some other people can do the testing on other versions, thats
what I asked and what I didn't send to xen-devel in the first place.

I fixed lan management access to one of te hosts (for serial
console/reboot/reset...) so on that one I could try re-testing with
3.4 testing.

If the issue doesn't show up in your tests then I agree it's probably
just in the specific version - in that case I can just inform oracle
and they can look into it on their own.

>> > is it just the gratious arp?
>
> The Grat. ARP doesn't get sent by current PVops kernels (I don't know if
> you are using this since you haven't provided any details about your
> system configuration). A fix is pending in the network subsystem

I know I didn't. Because I just asked for someone else to run the
script and retest ;p

> maintainers tree which I hope will be backported to to 2.6.32.x when it
> goes into mainline during the next merge window.
> See 06c4648d46d1b757d6b9591a86810be79818b60c and
> 592970675c9522bde588b945388c7995c8b51328 in net-next-2.6.git. You will
> also need to configure sysctl to enable the arp_notify option for the
> devices setting net.ipv4.conf.all.arp_notify = 1 is likely sufficient.

classic domU kernel

I'll try if I get a newer dom0 kernel to work, but I'll be on vacation
for a week now.
Considering that you successfully migrate a few thousand times I'd
suggest you forget about the issue until then.


Greetings,
Flo


-- 
'Sie brauchen sich um Ihre Zukunft keine Gedanken zu machen'

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.