[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL



Hi Stefano,

On 6/5/19 9:29 PM, Stefano Stabellini wrote:
On Wed, 5 Jun 2019, Julien Grall wrote:
Hi Stefano,

On 05/06/2019 00:11, Stefano Stabellini wrote:
On Tue, 4 Jun 2019, Julien Grall wrote:
On 6/4/19 6:39 PM, Stefano Stabellini wrote:
On Tue, 4 Jun 2019, Julien Grall wrote:
No, this patch introducing another source of TLB conflict if the processor
is
caching intermediate translation (this is implementation defined).

By "another source of TLB conflict" are you referring to something new
that wasn't there before? Or are you referring to the fact that still we
are not following the proper sequence to update the Xen pagetable? If
you are referring to the latter, wouldn't it be reasonable to say that
such a problem could have happened also before 00c96d7742?

It is existent but in a different form. I can't tell whether this is bad or
not because the re-ordering of the code (and therefore memory access) will
affect how TLBs are used. So it is a bit of gambling here.

If I read this right, this is the same underlying issue but due to the
re-ordering of the code, it could manifest differently. For instance the
impact on cache lines could be different.

I am sorry, but how did you came up with cache line difference here? It has nothing about cachelines, it just has to do how the TLBs are filled at a given point. If you re-order memory access, then you may as well have a different state of the TLBs at a given point.


Is this the case? If so, I think this is a tolerable risk, as other
things could affect it too, such as CONFIG options being
enabled/disabled, as we have just seen with CONFIG_LIVEPATCH. It is
almost "random".

See above. But yes it is almost random.

The bug reported by osstest actually taught me that even if Xen may boot
today
on a given platform, this may not be the case tomorrow because of the
slight
change in the code ordering (and therefore memory access).

/!\ Below is my interpretation and does not imply I am correct ;)

However, such Arm Arm violations are mostly gathered around boot and
shouldn't
affect runtime. IOW, Xen would stop booting on those platforms rather than
making unrealiable. So it would not be too bad.

/!\ End

We just have to be aware of the risk we are taking with backporting the
patch.

What you wrote here seems to make sense but I would like to understand
the problem mentioned earlier a bit better


What about the other older stanging branches?

The only one we could consider is 4.10, but AFAICT Jan already did cut
the
last release for it.

So I wouldn't consider any backport unless we begin to see the branch
failing.

If Jan already made the last release for 4.10, then little point in
backporting it to it. However, it is not ideal to have something like
00c96d7742 in some still-maintained staging branches but not all.

Jan pointed out it is not yet release. However, we didn't get any report for
problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather
avoid such backport in a final point release as we have a risk to make more
broken than it is today.

I find this acceptable for Xen 4.11 because it has been proven to help. We
also still have point release afterwards if this goes wrong.

If we do the backport, I would prefer to backport it to both trees, for
consistency, and because there might be machines out there where 4.10
doesn't boot with the wrong kconfig. This patch should decrease the risk
of breakage.

The counter point here is Xen 4.10 is going to be out of support in a few weeks. If you are about to use Xen 4.10 for your new product, then you already made the wrong choice. Why would you use an out of support release?

If you already use Xen 4.10, then you are probably fine to run this release on your platform. Why would you take the risk to break them?

Note that Osstest does not test Xen 4.10 (or earlier) on Thunder-X, this is does not need to be factored in the decision.


However, I see your point too. This is a judgement call -- we have not
enough data but we have to make a decision anyway. No way to tell which
way is best "scientifically".

I also understand your point, however this is a bit worrying that not enough data means that we are happy to backport a patch in a final point release. I would have thought more caution would happen during backport.


My vote is to backport to both. Jan/others please express your opinion.

To follow the vote convention:

4.11: -1
4.10: -1 (I was tempted by a -2 but if the other feels it should be backported then I will not push back).

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.