Re: [Xen-devel] Sporadic PV guest malloc.c assertion failures and segfaults unless pv-l1tf=false is set

Hi Andrew,

On Sun, Nov 25, 2018 at 02:48:48PM +0000, Andrew Cooper wrote:
> Which are your two types of Intel server?

7 of them have Xeon D-1540, 2 of them have Xeon E5-1680v4. I've
seen this issue on guests running on both kinds, and my reproducer
guest was moved from a production D-1540 server to a test E5-1680v4
and still suffered.

My only available test host at the moment is E5-1680v4.

> You say that you only see this with 64bit Debian kernels?

Yes, but this seems quite subtle. I've got one Debian stretch guest
where php-fpm crashes every time, and another Debian stretch guest
(unknown kernel) where a particular perl script has an assertion
failure in malloc.c every time. Apart from that across several
hundred other guests it's only been observed a handful of times in a
week and these times were all on 64-bit Debian jessie and stretch.
So with the limited data this could still be coincidence.

I have one guest administrator with a 64-bit Gentoo guest saying
they might have seen it once because gcc crashed during a
compilation, but I am still waiting for clarification on that one.

> Could you experiment with disabling PCID (`pcid=0` on the xen command
> line) and seeing if that affects the reproducibility.

I am unable to reproduce the problem with pcid=0. staging-4.10,
64-bit Debian PV guest with the kernel revision just before L1TF
fixes (linux-image-4.9.0-7-amd64 4.9.110-3+deb9u2). Xen dmesg does
say shadow paging is in effect.


