[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Host freezing after "fixing" recursive fault starting in multicalls.c
> Right, but the bad news is that there are no helpful hypervisor > messages at all. Sadly this is partly my fault, because I should > have asked you to do this log collection with a debug hypervisor. > Most of the possibly interesting messages would appear only there. > In any event, problems start quite a bit earlier, and typically > it's the first instance of a problem that is the most helpful to > analyze, as later ones may be cascade issues. The first sign of > problems is an overlapping To be honest, I was already wondering why there were only so few logs but while I already found the CMDLINE_XEN options for debug logs I didn't find any documentation how to build a debug hypervisor so far and it took me some time to work around the fact that I don't have physical access to the server to attach an actual serial cable and so on. I will try to compile Xen with debug enabled and collect more logs afterwards. Anything to be aware of? Von: Jan Beulich <jbeulich@xxxxxxxx> Gesendet: Mittwoch, 29. Januar 2020 09:59 An: Kurfer, Peter Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx Betreff: Re: Host freezing after "fixing" recursive fault starting in multicalls.c On 29.01.2020 09:29, Peter.Kurfer@xxxxxxxx wrote: > As requested I configured one host with: > >> loglvl=all guest_loglvl=all > > and collected one day of logs via serial interface: > > > https://drive.google.com/drive/folders/1sQvyNH0Sz28tUeVRZl9mowhB0Htd8ZpO?usp=sharing > > searching for "error" or "multicalls.c" leads to some stacktraces that might > be interesting. Right, but the bad news is that there are no helpful hypervisor messages at all. Sadly this is partly my fault, because I should have asked you to do this log collection with a debug hypervisor. Most of the possibly interesting messages would appear only there. In any event, problems start quite a bit earlier, and typically it's the first instance of a problem that is the most helpful to analyze, as later ones may be cascade issues. The first sign of problems is an overlapping [14991.827762] BUG: unable to handle page fault for address: ffff888ae2eb6bd8 and [14991.828172] WARNING: CPU: 5 PID: 2585 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x194/0x1c0 on CPUs 8 and 5. > As far as I know the ACPI errors in the context of IPMI can be ignored. Looks like so, yes, at least for the purposes here. What I wouldn't put off as a possible reason for problems is the significant amount of temperature related messages. What I also find at least curious (but possibly just because I know too little of the respective aspects of modern kernels) are the recurring __text_poke() instances on the stack traces. Assuming these are to be expected in the first place, there might be a race here which is either Xen-specific or simply has a much better chance of hitting (larger window?) when running on Xen. But I'm afraid this will need looking into (or at least commenting on) by a kernel person. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |