[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Revert of the 4.17 hypercall handler changes Re: [PATCH-for-4.17] xen: fix generated code for calling hypercall handlers
On 04.11.22 06:01, Andrew Cooper wrote: On 03/11/2022 16:36, Juergen Gross wrote:The code generated for the call_handlers_*() macros needs to avoid undefined behavior when multiple handlers share the same priority. The issue is the hypercall number being unverified fed into the macros and then used to set a mask via "mask = 1ULL << <hypercall-number>". Avoid a shift amount of more than 63 by setting mask to zero in case the hypercall number is too large. Fixes: eca1f00d0227 ("xen: generate hypercall interface related code") Signed-off-by: Juergen Gross <jgross@xxxxxxxx>This is not a suitable fix. There being a security issue is just the tip of the iceberg. The changes broke the kexec_op() ABI and this is a blocking regression vs 4.16. I t would really be beneficial if you would just tell what the issues are instead of voicing some vague concerns and then dropping off to silence again when asked (partially multiple times) what the real problems are. In lieu of having time to do https://gitlab.com/xen-project/xen/-/issues/93, here's the abridged list of errors The series claims "This is beneficial to performance and avoids speculation issues.", c/s 8523851dbc4. That half sentence is literally the sum total of justification given for this being related to speculation. The other half of the sentence claims performance. But no performance testing was done; the cover letter talks about one test with specifics, but it describes a scenario where the delta was a handful of cycles difference, as one part in multi-millions, probably billions. There is no plausible way that whatever raw data lead to the "<1% improvement" claim was statistically significant. Yes, and you told me to do some more performance testing with XenServer and you even didn't respond to queries regarding the state of that testing. The reason a performance improvement cannot be measured is that a big out-of-order core can easily absorb the hit in the shadow of other operations. Smaller cores cannot, and I'm confident that adequate performance testing would have demonstrated this. Unaddressed is the code bloat from the change; relevant because it is the negative half of the tradeoff on what is allegedly a net improvement on a fastpath. Actually trying to reason about the code bloat would have highlighted why it's rather important that the logic be implemented as a real function rather than a macro. You had several weeks to bring up that concern, yet you didn't. Also unaddressed is whether the multi-nesting even has any utility, and if it does, what it does to the other kinds of workloads. Unaddressed too is the impact from XSAs 398 and 407 which, as members of the security team, you had substantially more exposure to than most. Taking a step back from low level issues. This series introduces a NIH domain-specific language for describing hypercalls, but lacking in any documentation. As an exercise to others, time how long it takes you to get compile a hypervisor with a new hypercall that takes e.g. one integer and one pointer parameter. There should be a whole lot more acks on that patch for it to be considered to have an adequate review. Somewhere (I can't recall where, but it's 4 in the morning so I'm not looking for it now), a statement was made that if issues were found they could be addressed going forwards. But the series was committed without any possibility for anyone to perform the testing requested of the original submission. Funny statement. The series was pending for being committed for several months, I did ping multiple times for any feedback (especially you) and you didn't even respond with a "I'll come back to it later". You just behaved like /dev/null. That was discussed even in the community call, where the decision was taken to finally apply the series with you not even reacting in a minimal way. There was one redeeming property of the series, and yet there was no discussion anywhere about function pointer casts. But given that the premise was disputed to begin with, and the performance testing that stood an outside chance of countering the dispute was ignored, and /then/ that my objections were disregarded and the series committed without calling a vote, I have to say that I'm very displeased with how this went. Yes, me too. Being asked for specific concerns multiple times, not reacting, and then coming back after months that you have been ignored is just disgusting. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |