Xen project Mailing List

Re: [Xen-users] Dom0 crashed when rebooting whilst DomU are running

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Maik Brauer <maik.brauer@xxxxxxxxxxxxxxx>

Date: Wed, 12 Sep 2012 00:46:04 +0200

Cc: "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>

Delivery-date: Tue, 11 Sep 2012 22:47:44 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

On Sep 10, 2012, at 5:10 PM, Ian Campbell wrote: > On Mon, 2012-09-10 at 16:00 +0100, Maik Brauer wrote: >> On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote: >> >>> On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote: >>>> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote: >>>> >>>>> Could you not top post please, it makes it rather hard to follow the >>>>> flow of the conversation. >>>>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: >>>>>> As stated, you can alias shutdown to do exactly what you need, it can >>>>>> be as simple as a series of hard-coded operations to a complex custom >>>>>> shell script that parses your domains and closes each with feedback. >>>>> >>>>> Xen ships the "xendomains" initscript which can halt guest on shutdown >>>>> as well as automatically start specific guests on boot. It can also be >>>>> configured to suspend/resume them or (I think) migrate them away. >>>>> >>>>> For diagnosing the crash itself more details will be required than were >>>>> provided in the original post. Please see >>>>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. >>>>> At a minimum we would need a capture (serial console or photo) of the >>>>> crash backtrace. >>>>> >>>>> Ian. >>>>> >>>>> >>>> I found out that it hangs during re-boot of dom0 when having more >>>> Network interfaces involved, like: >>>> vif = [ 'mac=06:46:AB:CC:11:01, ip=<myIPadress>', '', '', >>>> 'mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge', '', >>>> 'mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge' ] >>> >>> 6 interfaces total, 3 of which have a random mac on each reboot and all >>> get put on the default bridge? >> >> No, not really. The bridge is different for each interface. > > You have three lots of '' which will all go onto the same bridge AFAICT > (whichever one is determined to be the default) That is right. As long as I put nothing inside that it should be a different script to execute, it will use default for '' > >>> If it is a hang then you might have some luck using hte magic sysrq keys >>> to print lists of blocked tasks. I'm not sure in Squeeze but you might >>> need to enable this as described in Documentation/sysrq.txt in the Linux >>> source. >>> >>> Blocked tasks are listed with SysRQ-'w'. If you have serial console then >>> 't' will list all task, but that list can be quite long so it is useless >>> without a serial console. >> >> List is empty. SysRQ -w and SysRQ-t shows nothing at all. > > You might need to increase the log verbosity with SysRQ-9 first? I did and now I got more Information. But due to the amount of data which slips over the console screen I am not able to record properly. Can you advice what to do here? > >> There is nothing running anymore. >> It shows periodically: INFO: task xenwatch:12 blocked for more than 120 >> seconds > > What is the very last thing printed before this? There is nothing before. Just that message pops up periodically. > >> Seems that the xenwatch is blocking the reboot here, is that assumption >> correct? But strange enough that I can't >> see any process anymore with the SysRQ -t or SysRQ -w > > The xenwatch thread ought to count as a process for at least the > purposes of SysRQ-t if not -w. Could be, but due to the amount it slips over the screen, that I am not able to read it line by line. Please advice a procedure to record. > >> >>>> In the Logfile of /var/log/message you can find this as the last line: >>>> Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system >>>> reboot >>>> Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) >>>> entering forwarding state >>>> Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) >>>> entering forwarding state >>>> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) >>>> entering disabled state >>>> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) >>>> entering disabled state >>>> Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) >>>> entering disabled state >>>> Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) >>>> entering disabled state >>>> Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. >>>> Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" >>>> swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com";] exiting on >>>> signal 15. >>>> >>>> In the /var/log/daemong.log you can find this message: >>>> Sep 8 15:44:37 rootsrv01 acpid: exiting >>>> Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, >>>> un-registering and exiting >>> >>> All the above (both message and daemon.log) look like normal parts of >>> shutting down to me. >>> >>>> Sep 8 15:44:37 rootsrv01 udevd-work[2276]: >>>> '/etc/xen/scripts/vif-setup offline type_if=vif' unexpected exit with >>>> status 0x000f >>> >>> This might be worth following up on. >> >> When putting a "sleep 5" in stop section of the /etc/init.d/xendomains: >> case "$1" in >> start) >> start >> rc_status >> if test -f $LOCKFILE; then rc_status -v; fi >> ;; >> >> stop) >> stop >> rc_status -v >> sleep 5 >> ;; >> >> then the system shuts down as expected and is rebooting properly. >> In the daemon.log file I couldn't find the error: Sep 8 15:44:37 rootsrv01 >> udevd-work[2276]: '/etc/xen/scripts/vif-setup offline type_if=vif' >> unexpected exit with status 0x000f >> anymore. It seems that it disappeared after putting a delay inside. Could it >> be a race condition here during shutdown, with the udev-daemon?? > > It could be a race with the guests actually shuting down vs the rest of > the initscripts running. > > Really the initscript ought to wait, the default at least with the > script shipped with xen is to do so, by using shutdown --wait. can you > confirm whether or not this is happening for you? At least I can see that the shutdown --wait is in the scripts. So it seems that the init script is waiting. But independent from that, something must be still in use. Which block the reboot process. > > Possibly someone is trying to talk to xenstore after xenstored has > exited -- I expect that would cause the sorts of blocked for 120 > messages you are seeing. > Could be, but we need to find out what is blocking the shutdown. I do not know what else I can do in order to measure and collect data for investigation. Let me know what else I can do? You can easiliy reproduce this issue, when using more that 3 Network devices. I installed that now on several machines at home and I have on all the same issue when using more than 2-3 network Interfaces. > > > _______________________________________________ > Xen-users mailing list > Xen-users@xxxxxxxxxxxxx > http://lists.xen.org/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.