[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Bug: Problematic DomU Duplication on reboot
Hi, OK, I did some more experiments and can now reproduce the duplication of a domain on it's reboot. Seems to be a race condition somewhere, as I can trigger it by putting high load on xend. The really bad thing: all instances of the domain are then actively running on the same block devices, which almost certainly causes massive data corruption :-( And: it also can happen in normal operation, I had it at least twice in a "normal" environment without much load on xend, possibly just a libvirt request at the wrong time during a DomU reboot. If this is already known: sorry for the long mail then... Is there a fix for 3.0.4-testing? :) If not: I more or less see two Bugs there: 1) why is the domain multiplicated during the reboot 2) why is it possible at all that it's started twice, using the same devices? Could there be a check added to prevent duplicate use of the same device readwrite, or is there already one which is failing in this case? Reproduction: I was able to reproduce this quite reliably using the sample-program dump-info.pl from the perl-Sys-virt libvirt Interface. I (as root) just do a while true; do ./dump-info.pl; done in the examples dir to stress the system/xend. Building the loop inside dump-info.pl and removing all "print"s even makes it work a bit "better" and really messing things up, so try that if the other doesn't work. I tested it on a P4 3 GHz and a Dualcore A64 2.2Ghz, it's easier when I use nosmp on the xen kernel on the A64 but it works also in the SMP case. While this is running I simply issue: xm reboot DomU1 and most of the times it results in two or more DomU1s running afterwards... Sometimes it also causes DomU1 to disappear, having an entry in the log it was rebooting too fast (of course I waited long enough with the reboot). If it "works" it looks like this: DomU1 97 256 1 -b---- 12.5 DomU1 98 256 1 -b---- 12.9 afterwards. DomU1 being just a normal paravirtualized Linux Guest. Dom0 is a CentOS 4 in case it could matter. Observations: During the reboot sometimes multiple duplications were created, load on Dom0 went up to about 30 and I saw lots of xen-backend hotplug agents: 10613 ? S< 0:00 \_ /bin/sh /sbin/hotplug xen-backend 10617 ? S< 0:01 | \_ /bin/sh /etc/hotplug/xen-backend.agent 15018 ? S< 0:00 \_ /bin/sh /sbin/hotplug xen-backend 15248 ? S< 0:01 | \_ /bin/sh /etc/hotplug/xen-backend.agent 14698 ? S< 0:00 \_ /bin/sh /sbin/hotplug xen-backend 14702 ? S< 0:00 | \_ /bin/sh /etc/hotplug/xen-backend.agent 15091 ? S< 0:00 \_ /bin/sh /sbin/hotplug xen-backend (about 60 more lines like this - and I had just one domU). After everything settled the result: VM100 38 256 1 -b---- 13.3 VM100 10 256 1 -b---- 14.1 Noticable the large difference from 10-38, meaning 27 domains were partially crated and then died, the Domain I rebooted had ID 9. Oh, and one more thing: when using "stress" to put load on the Dom0 system instead of the perl-Sys-virt tool, it usually causes the DomU to disappear on reboot, but I couldn't reproduce the duplication that way. All this done with the released 3.0.4.1-1, will try xen-unstable next, but possibly someone already as an idea what could be wrong here? (:ul8er, r@y _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |