[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revert of the 4.17 hypercall handler changes Re: [PATCH-for-4.17] xen: fix generated code for calling hypercall handlers



On 04.11.22 06:01, Andrew Cooper wrote:
On 03/11/2022 16:36, Juergen Gross wrote:
The code generated for the call_handlers_*() macros needs to avoid
undefined behavior when multiple handlers share the same priority.
The issue is the hypercall number being unverified fed into the macros
and then used to set a mask via "mask = 1ULL << <hypercall-number>".

Avoid a shift amount of more than 63 by setting mask to zero in case
the hypercall number is too large.

Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
Signed-off-by: Juergen Gross <jgross@xxxxxxxx>

This is not a suitable fix.  There being a security issue is just the
tip of the iceberg.

The changes broke the kexec_op() ABI and this is a blocking regression
vs 4.16.

I t would really be beneficial if you would just tell what the issues
are instead of voicing some vague concerns and then dropping off to
silence again when asked (partially multiple times) what the real
problems are.

In lieu of having time to do
https://gitlab.com/xen-project/xen/-/issues/93, here's the abridged list
of errors

The series claims "This is beneficial to performance and avoids
speculation issues.", c/s 8523851dbc4.

That half sentence is literally the sum total of justification given for
this being related to speculation.

The other half of the sentence claims performance.  But no performance
testing was done; the cover letter talks about one test with specifics,
but it describes a scenario where the delta was a handful of cycles
difference, as one part in multi-millions, probably billions.  There is
no plausible way that whatever raw data lead to the "<1% improvement"
claim was statistically significant.

Yes, and you told me to do some more performance testing with XenServer
and you even didn't respond to queries regarding the state of that
testing.

The reason a performance improvement cannot be measured is that a big
out-of-order core can easily absorb the hit in the shadow of other
operations.   Smaller cores cannot, and I'm confident that adequate
performance testing would have demonstrated this.

Unaddressed is the code bloat from the change; relevant because it is
the negative half of the tradeoff on what is allegedly a net improvement
on a fastpath.  Actually trying to reason about the code bloat would
have highlighted why it's rather important that the logic be implemented
as a real function rather than a macro.

You had several weeks to bring up that concern, yet you didn't.

Also unaddressed is whether the multi-nesting even has any utility, and
if it does, what it does to the other kinds of workloads.

Unaddressed too is the impact from XSAs 398 and 407 which, as members of
the security team, you had substantially more exposure to than most.


Taking a step back from low level issues.

This series introduces a NIH domain-specific language for describing
hypercalls, but lacking in any documentation.  As an exercise to others,
time how long it takes you to get compile a hypervisor with a new
hypercall that takes e.g. one integer and one pointer parameter.  There
should be a whole lot more acks on that patch for it to be considered to
have an adequate review.

Somewhere (I can't recall where, but it's 4 in the morning so I'm not
looking for it now), a statement was made that if issues were found they
could be addressed going forwards.  But the series was committed without
any possibility for anyone to perform the testing requested of the
original submission.

Funny statement.

The series was pending for being committed for several months, I did ping
multiple times for any feedback (especially you) and you didn't even
respond with a "I'll come back to it later". You just behaved like
/dev/null. That was discussed even in the community call, where the
decision was taken to finally apply the series with you not even reacting
in a minimal way.

There was one redeeming property of the series, and yet there was no
discussion anywhere about function pointer casts.  But given that the
premise was disputed to begin with, and the performance testing that
stood an outside chance of countering the dispute was ignored, and
/then/ that my objections were disregarded and the series committed
without calling a vote, I have to say that I'm very displeased with how
this went.

Yes, me too.

Being asked for specific concerns multiple times, not reacting, and then
coming back after months that you have been ignored is just disgusting.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.