[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with APIC on versions 4.9 and later (4.8 works)



Hi,

I do not have serial output on this setup, so I recorded a video with boot_delay=50 in order to be able to get all the kernel messages: https://youtu.be/y95h6vqoF7Y

This is running 4.14 from debian bullseye (testing).

I'm also attaching the dmesg output when booting xen 4.8 with  the same kernel version and same parameters.

I visually compared all the messages, and the only thing I noticed was that 4.14 used tsc as clocksource and 4.8 used xen. I tried to boot the kernel with "clocksource=xen" and the problem is happening with that also.

The "start" of the problem is that when the kernel gets to the "Freeing unused kernel image (initmem) memory: 2380K" it hangs and stays there for a while. After a few minutes it shows that a process (swapper) is blocked for sometime (image attached)

About finding what happened on the 4.8 -> 4.9 window, I may be able to build some code from git to check, I will try to find the build instructions to look at this.

Best regards,
Claudemir





Em ter., 19 de jan. de 2021 às 06:07, Jan Beulich <jbeulich@xxxxxxxx> escreveu:
On 18.01.2021 21:15, Claudemir Todo Bom wrote:
> Sorry for the simultaneous post on xen-users and xen-devel, but as I noted
> that the problem appears only for versions of xen that are >= 4.9, I think
> that developers may have a look at this.

Dropping xen-users.

> I recently bought a generic mainboard and a Xeon E5-2926v2 CPU, it is a 12
> core, 24 threads cpu.
>
> My system was already running on another machine with xen 4.11 on a debian
> 10 system and after replacing the mainboard it didn't boot.
>
> After many tries I noticed that downgrading to the previous version of Xen
> (4.8, available on Debian 9) works well. I also tried a lot of variations
> for the dom0 kernel, all of them with the same results.
>
> All my tests were done with 4.11, but I checked with a live version of
> Alpine Linux (3.7.3, with Xen 4.9.4) that the system doesn't boot on that
> release also.
>
> With more research I noticed that if I limit dom0 to use only one CPU
> (dom0_max_vcpus=1) the system boots up, I didn't check if guest VM worked,
> but I suppose that they would not be able to use the other 23 vcpus
> available, anyway, a system with one vcpu for dom0 would be very slow I
> think.
>
> I've noticed also that if I keep dom0 to use more than one core but disable
> acpi on the dom0 kernel, it boots up, unfortunately this is not
> sufficient because I cannot use any device attached to the system (not even
> the usb keyboard). This only helps to detect that the problem may be in the
> ACPI/APIC code.
>
> I tried many variations with parameters related with ACPI and APIC. None of
> them was satisfactory, always ended on a halted system or a self rebooting
> one.
>
> Can anyone point me to a solution for this?

For this we first of all need details about your problem: A full
boot log ideally, or if this isn't available anything at least
allowing us to see what exactly goes wrong (and with this I mean
the first thing going wrong, not later possible follow-on issues
from earlier problems), like screen shots. And this again ideally
with master / staging Xen, or if that's not feasible with as new
a version as possible.

I don't suppose you'd be up for bisecting the 4.8 ... 4.9 window
to determine when exactly your issue was introduced?

Jan

Attachment: xen-4.8.log
Description: Text Data

Attachment: IMG_20210119_154753.jpg
Description: JPEG image


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.