[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Revert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels"


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>
  • From: "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • Date: Thu, 16 May 2024 21:22:02 -0400
  • Arc-authentication-results: i=1; mx.zohomail.com; dkim=pass header.i=apertussolutions.com; spf=pass smtp.mailfrom=dpsmith@xxxxxxxxxxxxxxxxxxxx; dmarc=pass header.from=<dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1715908931; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=/ZP9OcrdM117h/0SaihIOKMG/tlbPUthFhAXt6FuLOg=; b=g2QIPX+6q4W+QRdsbd8hX4eGprDPRgk0rz/VRYhdB8tQo4Ffhdl/cOhlPnyb45Z1Oues/TINDgzwtelHhxQ1kPoNJlHniL+lyEeIdIpTIsS0uaHSDv4MAroKQXGXV8cXe5bWCXEWm5XYCYWBEcEf4BZqmMYO3m0zwfJTx9V9Qy8=
  • Arc-seal: i=1; a=rsa-sha256; t=1715908931; cv=none; d=zohomail.com; s=zohoarc; b=g9hOY0sUkPWdFO0zo052wg8MMKbsy7k+eOdgd20H6Ty+wa0fW5d7kSFSu/5ohtiGVJyJiP62+Aw5Mxrnn/a0QHNu7yF42ckWpq0IhAP8O2jB9GDH2McQ+m4xWKtJ2BmPT800B0KFzuu1UWWZR26FAxNKH5tUymSAJ1xL7kfVRFo=
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 17 May 2024 01:22:18 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 5/16/24 02:41, Jan Beulich wrote:
On 14.05.2024 11:25, Jan Beulich wrote:
On 03.04.2024 08:16, Jan Beulich wrote:
On 02.04.2024 19:06, Andrew Cooper wrote:
The commit makes a claim without any kind of justification.

Well, what does "have no business" leave open?

The claim is false, and the commit broke lsevtchn in dom0.

Or alternatively lsevtchn was doing something that was never meant to work
(from Xen's perspective).

  It is also quite
obvious from XSM_TARGET that it has broken device model stubdoms too.

Why would that be "obvious"? What business would a stubdom have to look at
Xen's side of an evtchn?

Whether to return information about a xen-owned evtchn is a matter of policy,
and it's not acceptable to short circuit the XSM on the matter.

I can certainly accept this as one possible view point. As in so many cases
I'm afraid I dislike you putting it as if it was the only possible one.

In summary: The supposed justification you claim is missing in the original
change is imo also missing here then: What business would any entity in the
system have to look at Xen's side of an event channel? Back at the time, 3
people agreed that it's "none".

You've never responded to this reply of mine, or its follow-up. You also
didn't chime in on the discussion Daniel and I were having. I consider my
objections unaddressed, and in fact I continue to consider the change to
be wrong. Therefore it was inappropriate for you to commit it; it needs
reverting asap. If you're not going to do so, I will.

For the record: With Andrew having clarified that in his revert's
description:

"The claim is false; it broke lsevtchn in dom0, a debugging utility which
  absolutely does care about all of the domain's event channels."

"all of the domain's event channels" means to include event channels Xen
uses for its own, and merely puts in the domain's table for ease of
handling, I've agreed that - in the absence of any better debugging
means - the revert may stay in place. For context, my prior understanding
of the issue was that lsevtchn simply stops probing further channels when
getting back any kind of error from EVTCHNOP_status (which I continue to
think wants addressing there, by a future version of a patch that was
already sent). However, there are follow-on concerns I have:

1) In the discussion George claimed that exposing status information in
an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
how a similar assumption by CPU designers has led to a flood of
vulnerabilities over the last 6+ years. Information exposure imo is never
okay, unless it can be _proven_ that absolutely nothing "useful" can be
inferred from it. (I'm having difficulty seeing how such a proof might
look like.)

Jan, I would agree with you that any resources exposed to the guest, even if that resource is status information, should be behind an XSM check. I would, however, have to disagree the position over proving absolutely nothing "useful" can be inferred. Prior to spectre becoming commonly aware, no one considered proving there needed to be protections against out-of-order instruction execution being used to bypass branch conditions. Predicting the future will more often than not fail, but with an XSM check in place, the dummy/default policy can just be updated with the general safe decision and others can use FLASK to provide fine grain access.

2) Me pointing out that the XSM hook might similarly get in the way of
debugging, Andrew suggested that this is not an issue because any sensible
XSM policy used in such an environment would grant sufficient privilege to
Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
for Xen-internal event channels. The debugging argument then becomes weak,
as in that case the XSM hook is possibly going to get in the way.

And this is where we have a fundamental difference. Event channels are XSM labeled objects that are always connected to a guest. If they were truly Xen-internal, then a guest would have no way to get access/reference them. Again, I never disagreed with whether the guest should or should not be able to access them. I did ask for a better explanation than, "has no business", which is a statement of opinion not of fact or reason. The point is these event channels are a resource attached to the guest and access control decisions to them should be made under XSM. Injecting a hard-coded access control in front of the XSM check subverted/invalidated rules in the existing FLASK policy.

3) In the discussion Andrew further gave the impression that evtchn_send()
had no XSM check. Yet it has; the difference to evtchn_status() is that
the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
evtchn_status() may indeed be useful for debugging, evtchn_send() may be
similarly useful to allow getting a stuck channel unstuck.)

A more appropriate default might be XSM_OTHER with conditions with in the hook as to whether the policy should be XSM_HOOK, XSM_TARGET or flat denial.

In summary I continue to think that an outright revert was inappropriate.
DomU-s should continue to be denied status information on Xen-internal
event channels, unconditionally and independent of whether dummy, silo, or
Flask is in use.

Any guest facing resources, regardless if the backing end is the hypervisor, should be protected with XSM. This allows the maintainers to codify what they believe is a sane policy in the dummy policy and the end user can use FLASK to enforce what they believe is acceptable.

Plus, as stated before, evtchn_send() would better remain in sync in this
regard with evtchn_status(). The situation is less clear for evtchn_close()
and evtchn_bind_vcpu(): Those indeed have no XSM checks while they do deny
operation on Xen-internal channels. It is likely more far fetched to
assume permitting them for debugging purposes might in fact help in rare
situations. Yet it may still be a matter of consistency to keep them in
sync with the other two. (Note that there's also evtchn_usable(), which
might then also need tweaking itself or at the use sites.)

Just because that is how it was, doesn't mean it was correct. I had a discussion with one of the original authors of FLASK before taking up the maintainership and he felt there were likely more XSM checks that should have been in place originally. He considered it a first order approximation of what should be protected.

v/r,
dps



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.