[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-users] Dom0 crashes without logging lately on Debian Stretch with Xen 4.8
- To: xen-users@xxxxxxxxxxxxxxxxxxxx
- From: Michael <delajamal@xxxxxx>
- Date: Tue, 6 Nov 2018 10:08:02 +0100
- Autocrypt: addr=mjs@xxxxxxxxxx; keydata= xsFNBFRPcX4BEACx8zwNH8NYu57EJS81DMf2JG9t90gu4M3ovbGjj86SQt7j0qw02aVIIOw+ w3++9wv9Wgi/2XahKWRoEaablILwE1jlo2sGeNSmRTbOB6uUYsO8b9gTjgGKYsMK1wg1DEM1 5wQExCs6nTTMkwDekPrclRPmDFBN1SEUXlGSR/u3meMovsJRZD0Iy/apAEaBf7XJgGNGQMht mVsO4jS/X/0p7q3njRFgo9KZL0OCqRUDRcENI07lJY3HILY0wLKbAxnj80Cvz/EYSq/jSjYB YfQ3YA3FIXx0POEfNLEvXctEqXanfNkFLRki5LHd1RTNjRXynu6IHzDtAC4VwhjoUA9JFVZj g2qp0SDGXIA4b3rWlxtfUMdYVfr4z46h7AH0nWsxfCaoLSCwvE0u9UgQq+ZbaSNDXz+tsbFs oYY2qWdvGPPwWXh2R3i0t93SElKrZVHt9OhUCRJKfQKiuGoaigLDN/asyS0bqfw0olNUcsF2 ai00WHUIzKO15nyuObHlJ0747Oork7+Xn9vk9nARB4IYSFgRwD3Ruiur1K8ZhDWRYENd6uQ1 qZ5S2Q3NNJyH3LWrqjdMraxtp9okVuPccrBurzSK1aqzS2XukHYR0Lzt5jeAvQG5l3FsyEXj hNVJBo242mMp9UKEUjqDVTXxTEUCiqwWsZLRC9ouGcIJxHPb4QARAQABzSFNaWNoYWVsIFN0 aWVnbGVyIDxtanNAY2VwaGVpLmNvbT7CwZYEEwEIAEACGyMHCwkIBwMCAQYVCAIJCgsEFgID AQIeAQIXgBYhBP6kJTifkSbfbn6+g1772gvxRt6VBQJbj3azBQllPBQ1AAoJEF772gvxRt6V DwEQAItTtMOUDQvQrB0hp0gbMpdaph50pFGSuQlneMl+5oQlVXkHRR8CcQAhDppQ7Yda7a1U 2s2z/QStzhJuiYlw94rSXKVbUwXYdOAKDqjLclJDn72+hHHkEmRYmUZ1zLprsTjV/EuuhnQO BXEumj17xenav3yMwH41eN67TIvLOfvt18a+ZpF4q2CaoVQ5Lqmcszr1I9NIrHqkkKG/+n1V GV3qEIWR11/WCNLyvUscyApLdWRTBMftRaMnFrkZ/kYpea9KymUYr7VhjLGqaPdWf6zQ4Qa7 3MDVxICPU4NqryspeGh5gPRX5t4CDDx9QqzEXrNuYDd0rnG96XIzVWZ7TDctmP9eLfmpHCzI DASM1Ubnf0HmCC0TE7Z+8a7y+Fti3Eu8I+4bHkcTDNLEf/lCVIUO0SScVERy1YJJFNWKq2+8 lq+JY4bX/kTRw8zEMXe4VmzXZ9SQ0FpCfrJM8kdRJP+ujHmojM36TqEGYGuufo4MQTC4X027 WLoUxn/0tMWMYICcnjnMYDN1uFwcYaPQ2KLyMXiVOcueL/Pv9m7FwtW2YJl4HfQHzfoc/ki6 hb5pd43Lxo0QEiqaJ/xWSN84IhaMrGWig9nsZJv1BBKr+2w0n0pBCcdBudan9ErUkNq9Sfke ml5Cm+sCFp+HVfzAGCfEtza28dZPHZhzcOoSg+thzsFNBFRPcX4BEAC1FIhNiNvlx8+Pc69X eh2jHumTosiu5D5Li+PaxbazerxaqYPZe6z5f39iFDQycLKCOauDyybAMmydmVztUrLBCag+ SPr2yWQaEJIaOwdSqPlBv0zJHrEu7vIZ+9i6C3cIiXSrfBVxEaAiurhl7WWpVaSxO7t7ya1B RsKSOY6yttRsAMCm5Tu8GyNoRCbh3+7qIyaYwVpbJETgowgZU68u9TOMnkG1fE0BlJb7qbCh fcXLJqBmj7R3xfCVMhXmyQ8PxXLUKwQKguGej46QzQlRjeQYABMRUkWPg//h3QfJlQmUW97k FAyV9gNwP+FsCfKx0mTON+iGheiV/0W3PQZ3+3J/i7LxtqixGrw0aPNXymvmxYOmBeNBTk0V 13IhcZXyW/r+E8lT6SYPx4PGSRNhahYns0TsE1TMTlNgjz7PibpBopOq0RnPs4cRnMCdt/Cs H9VW3TQMvZR6CgCk502YvPn7G82lDXLntU/fHxDksT3XRl+aWtluaLKNHjnRx8MRUn3QU9kL lAVzoROpWIKhdsM/BckXran/+DY/A30n8z3OUaEy3RZpadDZGJEuF+FoOYs+UDlq+YKBQt4Z 8gCUnx41KuVp9JxupyXMaK2uROzNF8KAZ4dRkzB/B42gHlmfKBb0pz9xkpX7xoBtihzcFeqx wsUPRTne/PMnZxgLXwARAQABwsF8BBgBCAAmAhsMFiEE/qQlOJ+RJt9ufr6DXvvaC/FG3pUF AluPdrMFCWU8FDUACgkQXvvaC/FG3pVExg//ZTy+3kGrhWPfKa96i4ET3PcG84PjcZZVNhPQ Crp253GJWw4sUk+6O94Z0IUdtUSrQHvxdkkQn8FCFP6SZaZVjpd/bcfO6FSc6xoMK9YRHPl7 PYa20uUzXnldJCXYdGXCiBWAj0igTdTFaAbNIruHE7lwIUq2lMwtBzLH5nJqPgxCEcWFRtg4 aDwtyLncrNXLVx8zXDlVhsaafU3O8bMJOzr20otFf2LGBWy1w+PaA5io3/4YOkhcLZj36a6T /M3BXjRfSLHYyg7xgTUvhx47LK0Fxb4T4oM6e+dPTqQO0HPFYJubpUH3Fy717SptFVlrTG5O FFHkGMYu4D7AMwflIRMEMiuR3cuMkYnW65kz8W7aWinuqEwwuB+NCda5r/Cct7eBTvoO9avi a1DlRMlDmhuoV2diReiDPy+GZdPAh4CTNhmGh3oohVLYmGlC9vmUR7lFpJxLFIEpJXGgqvRI ZCQwH4BD2vSlvvi0OpCmBGt0X7LP0qREqS1Bkpk/egGIod7gNIlEeXfuSEdOtqqgQzYqmGm5 Pk1DKdaFpen1AJgVOFghgL9k/aq9ZNtymk7MXlk2PJv0W3rcbb2tEgHIM7R4MbPGDIfLR79n zrIrgTrNPBWM5q0inWGNwUfDag6mn9U1Ou5k2vrXGmUggQJA/8HEDqPsZy85Vx6uyaws94A=
- Delivery-date: Tue, 06 Nov 2018 09:09:04 +0000
- List-id: Xen user discussion <xen-users.lists.xenproject.org>
- Openpgp: preference=signencrypt
Hello,
i had the same Issues.
In my case i tried
Ubuntu 18.04 with xen 4.9 and the Kernel Version 4.15.9 was the only
one wo has start up the DomU.
Tested on AMD Ryzen 1800X and Intel 8700.
In my case i got random system freezes Uptimes between 7 and 30
Days.
Older and never Kernels wont run.
This Problem is still present, i going to switch all Services to
Docker...
Regards,
Michael
Am 06.11.2018 um 09:37 schrieb Roalt
Zijlstra | webpower:
Hi
John,
Yes,
we are using PV only and we only run Debian Linux on the
servers. We still have some DomU Jessie servers running
with the stock kernel. We did update our Dells to the
latest firmware so it does include more recent intel
microcode with that. But on Debian we did not yet enable
the intel-firmware yet, since we had so much instability
and so much parameters that could be the culprit, we did
not want to add another.
If
your server is very busy, I think the chance to have a
crash is higher. We have seen crashes on our active MySQL
databases whereas the slave MySQL database server did not
crash that quickly, however after using the slave MySQL
database as primary database for a while (because we were
debugging the crashed master database) it could very well
happen that the slave would crash too.
We
have done tests with downgrading firmware of Dell (which
also means using an older intel microcode) but that did
not help. So having the latest firmware is okay.
We
are now testing a few scenarios:
- one server with an older kernel (4.9.0-4-amd64),
with DomU 3.16 kernel, which runs for 16 days now
- one server with the updated -kernel
(4.9.0-8-amd64), with DomU 3.16 kernel, which runs for
28 days now surprisingly
- one server with the updated -kernel
(4.9.0-8-amd64), and all DomUs on the backported 4.9
kernel.
It
all doesn't really make much sense. We do have the
expectation that the older kernel will keep on running and
that the 4.9 DomUs will help to keep the servers alive.
We
have tested with 4.14 and 4.16 kernels (from backports)
but that did not make a difference in stability.
|
Barcelona
| Barneveld |
Beijing |
Chengdu |
Guangzhou
Hamburg |
Shanghai |
Shenzhen |
Stockholm |
|
|
It could be as you mention... your domU are they
PV? I am using paravirtualization exclusively and on
this specific server have the following CPU:
Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
Do you have the intel-microcode Debian package from
the non-free repo installed on your servers? I
currently don't...
J.
Hi
John,
It
could very well be that it is also restricted to
some CPUs, but I am inclinded to believe that the
used DomU kernels can influence stability. We did
have a pretty busy SSL offloader running on a 3.16
kernel, which might have caused the crashes.
Just
for reference, we have the following two CPUs
causing us trouble, but I am not sure if it
matters.
Intel(R) Xeon(R) CPU
E5-2640 0 @ 2.50GHz
Intel(R) Xeon(R) CPU
E5-2670 v3 @ 2.30GHz
Roalt
Hi,
Thanks for your feedback. I was wondering
because I have just upgraded a Debian 9 server
to the latest kernel with the latest Xen
packages from the official Debian repo. The
only difference is that I have an older IBM
server which is already ~7 years old patched
with the latest BIOS/UEFI and so far so good
no crash. The uptime is 6 days for now. Here
are the details about my kernel and xen
packages.
ii xen-hypervisor-4.8-amd64
4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
amd64 Xen Hypervisor on AMD64
ii linux-image-4.9.0-8-amd64
4.9.110-3+deb9u6
amd64 Linux 4.9 for 64-bit PCs
Regards,
J.
Hi John,
the problem is that I cannot provide any
metrics or logfiles showing an error. I can
only tell that dom0 is rebooting for a reason
that is not logged. I have no physical access
to the server. I got one other report about
this kind of issue.
My assumption the cause are the backported
patches is based on the current 16 day uptime.
16 days ago the server rebooted every 3-5
days. It won’t be a useful bug report from my
point of view.
The other thing is that my two servers are
now running upstream Xen and kernel and I
might not go back to both old versions in
Debian stretch. The other server had always
running upstream versions and had never a
problem, that’s why I updated the other, too.
Best regards
I was wondering if any of
you guys reported this
bug/issue/problem back to the Debian
community? For example on their
bugs.debian org web site?
Hi,
I had these crash
problems with the Xen version in
Debian stretch, too. After 3 to 7
days the Xen server rebooted
without log entry or something
else to observe. The problems
started when the first patches
were applied by Debian. Some
updates made it better, the last
worse again. I checked hard
drives, RAM and closely monitored
metrics what might be the cause.
My solution after no
longer suspecting a hardware
fault: build upstream Xen 4.11 for
Debian stretch. I am currently
running this setup with my own
build of kernel 4.19. The machines
are now working stable again.
Hi there,
Ever since all the Meltdown
and Spectre kernel updates
and possibly also Xen 4.8
updates, we experience
crashes of the Dom0 just out
of the blue. Sometimes after
1 day, sometimes after a few
days or even 14 days,
completely random.
We have two Dell P730
servers and two Dell P720
servers with this behaviour.
One thing is that we updated
these machine to the latest
available firmware, because
that is the most secure way.
Then we installed Debian
Stretch with Xen 4.8 support
We have done serveral
installs and 4 servers seem
to crash pretty fast and
other don't. In the end we
think that we can lead it
back to the xen-4.8.4-pre
version being stable and the
xen-4.8.5-pre being
unstable. This was kinda
independent of the kernel
that we were using 4.14 or
4.9.0-8-amd64. This is off
course all Debian package
numbering.
As last resort we updated
on one server all DomU
kernels of our Jessie
servers on this Dom0 to
4.9.0 from backports instead
of the 3.16 kernel. For now
that seems to work, but the
crashes are random so it
could happen any time again.
The idea is that these
kernels are completely
spectre& meltdown
unaware and might cause
trouble in Xen kernel
support. I am not sure if
this is true at all, but we
are pretty lost what the
actual cause is.
We also tested with CentOS
and we also had these
crashes there with certain
combinations of kernel/Xen.
The most recent updates seem
to be more stable tough. The
most frustrating part is the
there is absolutely no logs
to be found. No kernel oops
or what.. the server just
resets and boots again.
Are there others
experiencing problems like
this? Do you see more
frequent server/kernel
crashes on production
servers?
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
|
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
|