[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Problem with Xen 4.5 failing XTF tests on old AMD cpus ?



Ian Jackson writes ("Re: Problem with Xen 4.5 failing XTF tests on old AMD cpus 
?"):
> Andrew Cooper writes ("Re: Problem with Xen 4.5 failing XTF tests on old AMD 
> cpus ?"):
> > It will be because of Gen1 SVM which doesn't have NRIP support. This 
> > case requires emulation of the invlpg instruction, rather than just 
> > using the information provided by the intercept.
> 
> So it seems that the xtf test is not effective at detecting the Xen
> bug except on old hardware ?  Is there some way it could be improved ?
> 
> It's obviously not desirable that we should have tests which pass in
> the production colo and fail in the ancient Citrix Cambridge instance.

Andrew and I discussed this IRL.  I thought it worth writing down what
was said so that we can refer to it later.

This test failure is due to genuine bug(s) in Xen 4.5, in that it
doesn't have various fixes (see the rest of the thread).

The bugs are only exposed on old hardware, which uses different
codepaths in Xen.  On new hardware Xen takes a different approach.
This is why the test failure appears in the Citrix (Cambridge)
osstest but not in the Xen Project (Massachusetts) instance.

Xen decides which approach to take based on hardware features.  There
is not currently any way to tell Xen not to use these hardware
features (at least, not in this case - the AMD SVM NextRIP feature) if
they are available.  Andrew has a long-term plan to add more of such a
facility - but that is not going to be available any time soon.

In this particular case, the old hardware uses the Xen instruction
emulator where newer hardware uses hardware support.  (Andrew tells me
that without NextRIP support, Xen must use the instruction emulator
when handling `invlpg` instructions on behalf of the guest, to
calculate how many bytes to move the instruction pointer forward by.
And it is the emulator which has the bug here.)

So FEP could be used to cause the bug to manifest even on new hardware
and indeed where FEP is available, XTF does then use FEP to run
exactly the same set of tests.  However, FEP is not available in Xen
4.5 and there are good reasons for not backporting it there.

It would be possible to backport the bugfixes to Xen 4.5.  However,
the bugs address only very rare problems.  Andrew thinks the bugs
are, insofar they are bugs which might cause lossage, more likely to
bbe roughly "crashes obscure or very oddly-behaved guests" than
"crashes commonly used guests but only with very low probability.
The latter kind of bug would be worth a backport; the former much
less so (especially in a very old stable release, and especially
when the fixes involve behavioural changes).

The fixes would also provide an unquantified performance improvement
on AMD hardware, due to avoiding extraneous TLB flushes, but Andrew
says he doubts that's worth caring about.

We discussed host stickiness, host-specific bug detection, and
regression detection, in osstest.  I reassured Andrew that I think
the current osstest algorithms will deal with this situation
tolerably well (if not perfectly).

The conclusion is that there is nothing to be done, at least in the
short term.  There are good reasons for the bug to persist in 4.5 and
good reasons for it being hard to detect on newer hardware.

Ian.

(Thanks to Andrew for the IRL explanation and for review of this
email.)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.