[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] slow live magration / xc_restore on xen4 pvops
Hi,in preparation for our soon to arrive central storage array i wanted to test live magration and remus replication and stumbled upon a problem. When migrating a test-vm (512megs ram, idle) between my 3 servers two of them are extremely slow in "receiving" the vm. There is little to no cpu utilization from xc_restore until shortly before migration is complete. The same goes for xm restore. The xend.log contains:[2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:286) restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0, [2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:305) [xc_restore]: /usr/lib/xen/bin/xc_restore 48 43 1 2 0 0 0 0 [2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) xc_domain_restore start: p2m_size = 20000 [2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) Reloading memory pages: 0% [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal error: Error when reading batch size [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal error: error when buffering batch, finishing When receiving a vm via live migration finally finishes. You can see the large gap in the timestamps. The vm is perfectly fine after that, it just takes way too long.First off let me explain my server setup, detailed information on trying to narrow down the error follows. I have 3 servers running xen4 with 2.6.31.13-pvops as kernel, its the current kernel from jeremy's xen/master git branch. The guests are running vanilla 2.6.32.11 kernels.The 3 servers differ slightly in hardware, two are Dell PE 2950 and one is a Dell R710, the 2950's have 2 Quad-Xeon CPUs (L5335 and L5410), the R710 has 2 Quad Xeon E5520. All machines have 24gigs of RAM.They are called "tarballerina" (E5520), "xentruio1" (L5335) ad "xenturio2" (L5410). Currently i use tarballerina for testing purposes but i dont consider anything in my setup "stable". xenturio1 has 27 guests running, xenturio2 25.No guest does anything that would even put a dent into the systems performance (ldap servers, radius, department webservers, etc.). I created a test-vm on my current central iscsi storage, called "hatest" that idles around, has 2 VCPUs and 512megs of ram. First i testen xm save/restore: tarballerina:~# time xm restore /var/saverestore-t.mem real 0m13.227s user 0m0.090s sys 0m0.023s xenturio1:~# time xm restore /var/saverestore-x1.mem real 4m15.173s user 0m0.138s sys 0m0.029sWhen migrating to xenturio1 or 2 it the migration takes 181 to 278 seconds, when migrating it to tarballerina it takes rougly 30seconds: tarballerina:~# time xm migrate --live hatest 10.0.1.98 real 3m57.971s user 0m0.086s sys 0m0.029s xenturio1:~# time xm migrate --live hatest 10.0.1.100 real 0m43.588s user 0m0.123s sys 0m0.034s --- attempt of narrowing it down ----My first guess was that since tarballerina had almost no guest running that did anything, it could be a issue of memory usage by the tapdisk2 processes (each dom0 has been mem-set to 4096M). I then started almost all vms that i have on tarballerina: tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem real 0m2.884s tarballerina:~# time xm restore /var/saverestore-t.mem real 0m15.594s i tried this several times, sometimes it too 30+ seconds.Then i started 2 VMs that run load and io generating processes (stress, dd, openssl encryption, md5sum). But this didnt affect xm restore perfomance, it still was quite fast: tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem real 0m7.476s user 0m0.101s sys 0m0.022s tarballerina:~# time xm restore /var/saverestore-t.mem real 0m45.544s user 0m0.094s sys 0m0.022s i tried several times again, restore took 17 to 45 secondsThen i tried migrating the test-vm to tarballerina again, still fast, inspite of several vms including load and io generating vms: This ate almost all available ram. cputimes for xc_restore according to target machine's "top": tarballerina -> xenturio1: 0:05:xx , cpu 2-4%, near the end 40%. xenturio1 > tarballerina: 0:04:xx, cpu 4-8%, near the end 54%. tarballerina:~# time xm migrate --live hatest 10.0.1.98 real 3m29.779s user 0m0.102s sys 0m0.017s xenturio1:~# time xm migrate --live hatest 10.0.1.100 real 0m28.386s user 0m0.154s sys 0m0.032sso my attempt of narrowing the problem down failed, its neither the free memory of the dom0 nor the load, io or the memory the other domUs utilize. ---end attempt---More info(xm list, meminfo, table with migration times, etc.) on my setup can be found here: http://andiolsi.rz.uni-lueneburg.de/node/37There was another guy who has the same error in his logfile, this might be unrelated or not: http://lists.xensource.com/archives/html/xen-users/2010-05/msg00318.html Further information can be given, should demand for i arise. With best regards --- Andreas Olsowski <andreas.olsowski@xxxxxxxxxxxxxxx> Leuphana Universität Lüneburg System- und Netzwerktechnik Rechenzentrum, Geb 7, Raum 15 Scharnhorststr. 1 21335 Lüneburg Tel: ++49 4131 / 6771309 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |