[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] xen domU segfaults with xpti on intel based systems

On Fri, Jan 11, 2019 at 2:36 PM Juergen Gross <jgross@xxxxxxxx> wrote:
On 11/01/2019 14:05, Tomas Mozes wrote:
> On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <jgross@xxxxxxxx
> <mailto:jgross@xxxxxxxx>> wrote:
>     On 11/01/2019 07:05, Tomas Mozes wrote:
>     > Hello,
>     > we are observing random PV domU segfaults on Intel based systems with
>     > XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
>     > reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.>
>     <http://4.11.1.> The problem can be
>     > mitigated by adding xpti=false to xen command line options.
>     >
>     > Some of the affected systems are Debian 8/9 (Debian 10 with kernel
>     4.18
>     > seems to work fine) and NetBSD 7. It's harder to reproduce the
>     segfaults
>     > on Debian, but on NetBSD it's almost instant.
>     Hmm, as we haven't received any similar reports, I suspect there is
>     something special on your side.
>     Can you please be more specific regarding:
>     - hardware (machine type(s), processor model(s), ...)
>     - other config options (hypervisor command line, hypervisor .config)
>     A hypervisor log (output of "xl dmesg") would help, too. Please add
>     "loglvl=all guest_loglvl=all" to the hypervisor command line for that
>     purpose. If possible use a debug hypervisor for this test, as that
>     will produce more diagnostic output.
>     Juergen
> These segfaults were actually spotted by the gmp project maintainer and
> only later they were locally reproduced on other machine (intel too).
> A machine on which it can be reproduced: Intel DH87MC with Intel Core
> i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
> But for example i cannot reproduce on my desktop machine: Intel DH77EB
> with Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)

Okay, those two cpus differ in a critical feature: on Ovy Bridge XPTI
can't make use of the processor's PCID feature due to a lack of the
INVPCID instruction.

Can you test wheter adding "pcid=false" to the hypervisor command line
on the Haswell machine makes any difference?

Setting "pcid=false" makes the segfault go away too.

And one other question: could it be the problem occurred at the same
time when

(XEN) [2019-01-11 12:41:06] d1 L1TF-vulnerable L4e 000000070cb93004 -

was issued?

It's printed shortly after the domU is started, like 10 seconds before the segfault. It's printed in both cases (with/without pcid=false).

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.