Xen project Mailing List

Re: [Xen-devel] xenbus and the message of doom

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>

Date: Fri, 16 Dec 2011 10:18:33 +0100

Cc: Olaf Hering <olaf@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Fri, 16 Dec 2011 09:19:17 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 15.12.2011 21:53, Ian Campbell wrote: > On Thu, 2011-12-15 at 19:20 +0000, Stefan Bader wrote: >> I was investigating a bug report[1] about newer kernels (>3.1) not booting as >> HVM guests on Amazon EC2. For some reason git bisect did give the some pain, >> but >> it lead me at least close and with some crash dump data I think I figured the >> problem. >> >> commit ddacf5ef684a655abe2bb50c4b2a5b72ae0d5e05 >> Author: Olaf Hering <olaf@xxxxxxxxx> >> Date: Thu Sep 22 16:14:49 2011 +0200 >> >> xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old >> kernel >> >> This change introduced a xs_reset_watches() call. The problem seems to be >> that >> there is at least some version of Xen (I was able to reproduce with a 3.4.3 >> version which I admit to deliberately not having updated) for which xenstore >> will not return any reply. >> >> At least the backtraces in crash showed that xs_init had been calling >> xs_reset_watches() and that was happily idling in read_reply(). Effectively >> nothing was going on and the boot just hung. >> By just not doing that xs_reset_watches() call, I was able to boot under the >> same host. And for what it is worth there has not been an issue with Xen >> 4.1.1 >> and a 3.0 dom0 kernel. Just this "older" release is trouble. > > I sent a patch to fix exactly this issue in oxenstored (the ocaml > xenstore) just this week. Is there any chance that you are running C > xenstored with Xen 4.1.1 and oxenstored with Xen 3.4.3? Thanks for the pointer, I missed that thread. Now dumb question, would oxenstored be named that way? Or iow, how do I quickly find out what is running? The binary running in 3.4.3 is xenstored which is a linked executable (same in 4.1.1). But I guess, whatever version is running, any oxenstored would not have the bugfix because things take longer to reach any packaged versions. I rather would suspect that in 4.1.1, the reset watches message probably is just known and thus avoiding the problem. Unfortunately it is near impossible to tell for sure what exactly EC2 is running. The major point here probably is that when the upstream kernels are calling that message and there are versions of xenstored in production that will just ignore it while the kernel blocks waiting, this is a painful path. Production systems tend to update slowly and the symptoms are not that obvious. Having a timeout maybe could be useful not only for this case, but clearly it is nothing that should be rushed. So reverting the patch introducing that call (at least in the distro kernel) may be the best thing to do (knowing that this will be bought by loosing the fix for kexec boots fo crash kernels). -Stefan >> Now the big question is, should this never happen and the host needs urgent >> updating. Or, should xs_talkv() set up a time limit and assume failure when >> not >> receiving a message after that? I could imagine the latter might lead at >> least >> to a more helpful "there is something wrong here, dude" than just hanging >> around >> without any response. ;) >> >> -Stefan >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.