[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Debugging sudden hangs



On Mon, 20 Aug 2018 at 17:03, Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>
> On Sun, Aug 19, 2018 at 08:46:44PM +0800, Liwei wrote:
> > Hi list,
> >     We recently updated our system and started experiencing random
> > hangs. It happens, on average, once every 1.5 days (sometimes taking 2
> > days to occur, other times happening multiple times a day, somewhat
> > proportional to IO load).
> >
> >     Before troubling the developers too much, I'd like to collect more
> > information, however, the problem is the hangs occur without any
> > symptoms/crashes/panics. I've booted xen and dom0 with:
> > "loglvl=all guest_loglvl=all" and "loglevel=10 debug initcall_debug"
> > respectively.
>
> You should add iommu=debug to the command line.
>
> >
> >     When the hang occurs, all domUs and dom0 just stop responding to
> > key presses, networking and there is no IO activity. Nothing gets
> > generated in the console/logs (no symptoms either, no logs out of the
> > ordinary). Even hitting ctrl+a multiple times in the console does
> > nothing (indicating xen is dead too). On the video console, we just
> > have a blinking cursor after the last console log (though my
> > understanding is that the cursor blink might be generated by the video
> > card rather than any indication that at least something is still
> > running). If the hardware WDT is on, the watchdog eventually bites and
> > reboots the system.
>
> It would be interesting to get the crash trace printed by the watchdog.
> And to use a debug build of the hypervisor, that might trigger some
> assertions inside of Xen that could lead to the cause of the issue.
>
> Roger.

Hi Roger, list, I've been trying to find time to look into this but
other work have been keeping me away ever since I found an ugly (and
definitely unsafe) workaround.

Downgrading all the way to 4.2.5 actually fixed the problem. Or maybe
stops exercising the offending hardware (if it is a hardware issue).
It might be possible that newer versions of xen will work, but we've
been okay with this for now since the server is not world-facing.

However, obviously 4.2.5 is way behind on the headline (and other)
security issues the past few years. I'll get around to isolating the
cause of the sudden hangs in December, probably with your suggestions
and a bisect run.

Just sending this email to let the list know of a dangerous workaround
in case anyone has to use it.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.