[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: System hangs when NVMe is under load



On 7/16/20 6:34 AM, Stanislav wrote:
Hello,

I would like to be excused beforehand if i am sending something the the wrong 
folks.

We have a strange situation going on here with a couple of our servers. We've 
been experiencing issues with the combination of Debian+XEN+Samsung NVMe.

Problem:

It all began with 
https://serverfault.com/questions/1006366/samsung-nvme-disappears-when-server-on-average-to-high-load

The situation is close to the one described above with some differences. *Now 
It can be reproduced.*

  * OS: 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1
  * CPUS: Intel(R) Xeon(R) CPU E5-1650 v4
  * NVMe: Samsung MZ1LB1T9HALS-00007
  * xen_version            : 4.11.4-pre
  * Server: Supermicro Super Server/X10SRW-F, BIOS 3.2

We've gathered some more information - It happens only when XEN is loaded.

The command that breaks everything is the following and it breaks it fast. In the following situation it just needs approx 20 secs to hang the whole system. I am attaching the Call trace which occurs during the hang up.

Does the system still respond to sysrq https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html ? The dom0 is a PV system so you'll need to use ctrl-o to send a break if you're on the serial console. You could try to use that to dump a backtrace for all the CPUs.

You can also try sending a command to xen. Xen has a debug handler, which unfortunately I can't find good documentation for (this seems like something really basic missing, oops.) In any case, if you use 'ctrl-a' three times on your console that should switch between Xen and the dom0. From there 'h' shows the commands. I do not have useful advice on what to collect there.

You may also be interesting in trying a debug kernel build. We did one already 
for Debian for other reasons so we may be able to help you with that.

--Sarah



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.