[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 1/1] xsm: allows system domains to allocate evtchn


  • To: "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 30 Mar 2022 17:00:15 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YRDBXnP9C9M2vsmyTw79C19QJ56/HRm2OQRPejgum9U=; b=O2x51jWgT/FOfAZEnKQr1hpaNGeRgDIkSU3TyP3HwQc8jxs6gC1yrwX1lV3bzFOArgXpchrucJvMjfJsoJF4zT+hFIXw4Duq34hgBRN8HWNCJzQ2Z6qhKGCMiz9cjQK07qG5Hcut/34EmZgHjZPGZIqCAgYL7dqu8zQ7cSwzU34bfNcooP95sc3tj68pAxqbG8o4SxJD4n1lvCla9zMmUTvyPVG/sOSSB6Z2ZOyDmGwrqE1U8VVimtWjWkpcECXDIPoRSBAnxuHsJvw/Zcthi07ZWF5L+toeAX5+b3NCpa+wEw6NlQHn30ruV992tbnPxz1VL1O7E/Sl1tfZ9yK5+w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MIPQkW84LtCinkFGhmlfg3Bz4DMzLZrgopellglRG1JCCL2WdoQ6VojUj8bcITIvIcb4ntecAxS6D758F05zs1sZ/NrC7soQfkU9YkxQJACUp2gEm8CfuxhuQ0r8VIjJpjUF3zNEBCfhme+lIDOLrkx7Q/EzHInD+G1iz7RjDHsfmL8cKHaky+1NZxNZ7KqQGeAjXBqZO1uktmlxh9L6+/bTFVJhhbjN37A6S61zf8l3P31kqpJ/igJn7+FUvnJg9ACcdGdv7F3c/O32H8K2Yoou3g0HKq0ZD2wStxggODCySqwzGaUN+UyZvIEU72y5uloT4dLXb9aWMQJy4hPe3g==
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <scott.davis@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Jason Andryuk <jandryuk@xxxxxxxxx>
  • Delivery-date: Wed, 30 Mar 2022 15:00:33 +0000
  • Ironport-data: A9a23:kXSrya8mUmGhN7LoOSjSDrUDH36TJUtcMsCJ2f8bNWPcYEJGY0x3z moXUWyFb6uKYzP0f90nYIi3p0MPscPWz9RlGwVrqCk8E34SpcT7XtnIdU2Y0wF+jyHgoOCLy +1EN7Es+ehtFie0Si+Fa+Sn9T8mvU2xbuKU5NTsY0idfic5DnZ54f5fs7Rh2NQw2oPhW1nlV e7a+KUzBnf0g1aYDUpMg06zgEsHUCPa4W5wUvQWPJinjXeG/5UnJMt3yZKZdhMUdrJ8DO+iL 9sv+Znilo/vE7XBPfv++lrzWhVirrc/pmFigFIOM0SpqkAqSiDfTs/XnRfTAKtao2zhojx/9 DlCnZ73Fwd5GLHHpP9eeRcFPxkkHJFY47CSdBBTseTLp6HHW37lwvEoB0AqJ4wIvO1wBAmi9 9RBdmpLNErawbvrnvTrEYGAhex6RCXvFJkYtXx6iynQEN4tQIzZQrWM7thdtNs1rp4RTKiFP JtDAdZpRDeYeBNfMHILMcwB38qSuWvjdC0AuHvA8MLb5ECMlVcsgdABKuH9ZdiiVchT2EGCq Qru5H/lCxsXMNiezzut8X+2gOLL2yThV+o6FrS++uR7nV67yWkaCRlQXly+ydG1hUKzVMhWA 1AF8Sop664p/QqkSceVdx+lpH+JuDYMVtwWFPc1gCmdx6yR7wuHC2wsSj9adMdgpMIwXSYt1 FKCg5XuHzMHmLGNUnec6re8pCu/IzQINnQFYTIYTAwD+J/op4RbphDFQ8tnEaW1psboAjy2y DePxAAljLIPkYgH3ru65njcnzu2opHDCA8yjjg7RUr8sFk/PtT8IdX1tx6Ltp6sMbp1UHGD+ yYEy+6C59wVTruDyQC1GsQxAqqmsqPt3CLnvXZjGJwo9jKI8nGlfJxN7DwWGHqFIvroaheyP haN5Fo5CIt7eSLzMPQpO97Z59ECl/CIKDjzahzDgjOiiLBVfRTPwixhbFX4M4vFwBl1yvFX1 Xt2nK+R4Zcm5UZPkWLeqwQ1i+ZDKsUCKYX7H8qTI/OPi+f2WZJtYe1ZWGZil8hghE9+nC3b8 sxEK+yBwAhFXev1b0H/qNBPfQ1adyFhXcqu96S7k9JvxCI8QgnN7NeLnNscl3FNxfwJxo8kA FnjMqOn9LYPrSKecljbApySQLjuQYx+vRoG0d8EZj6VN4wYSd/3ts83LsJvFZF+rbAL5aMkH pEtJpTbatwSG2uvxtjoRcSkxGCUXE/w3lzm0uvMSGVXQqOMsCSUo4e9JVG+rXNm4+jenZJWn oBMHzjzGPIrbw9jENzXeLSoyVawtmIag+V8Qw3DJdw7Rakm2NICx/DZ5hPvH/wxFA==
  • Ironport-hdrordr: A9a23:Q9WeDqspAJ5rMHWtt02ILcQG7skClIMji2hC6mlwRA09TyXGra +TdaUguSMc1gx9ZJhBo7G90KnpewK6yXdQ2/hqAV7EZniahILIFvAY0WKG+VPd8kLFh4xgPM tbAs1D4ZjLfCRHZKXBkXiF+rQbsaC6GcmT7I+0pRcdLj2CKZsQlzuRYjzrbHGeLzM2Y6bReq Dsgvau8FGbCAsqh4mAdzE4dtmGg+eOuIPtYBYACRJiwA6SjQmw4Lq/NxSDxB8RXx5G3L9nqA H+4kbEz5Tml8v+5g7X1mfV4ZgTsNz9yuFbDMjJrsQOMD3jhiuheYwkcbyfuzIepv2p9T8R4Z LxiiZlG/42x2Laf2mzrxeo8w780Aw243un8lOciWuLm72PeBsKT+56wa5JeBrQ7EQt+Ptm1r hQ4m6fv51LSTvdgSXU/bHzJl5Xv3vxhUBnvf8YjnRZX4dbQqRWt5Yj8ERcF4pFND7m6bogDP JlAKjnlbprmGuhHjHkV1RUsZyRtixZJGbEfqFCgL3Z79FupgE286NCr/Zv3Evp9/oGOu15Dq r/Q+FVfYp1P7wrhJJGdZc8qPSMex7wqDL3QRSvyAfcZeg600ykke+D3Fxy3pDvRKA1
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Mar 30, 2022 at 09:42:18AM -0400, Daniel P. Smith wrote:
> On 3/30/22 05:40, Roger Pau Monné wrote:
> > On Tue, Mar 29, 2022 at 07:12:56PM -0400, Daniel P. Smith wrote:
> >> On 3/29/22 03:29, Roger Pau Monné wrote:
> >>> On Mon, Mar 28, 2022 at 04:36:22PM -0400, Daniel P. Smith wrote:
> >>>> During domain construction under dom0less and hyperlaunch it is 
> >>>> necessary to
> >>>> allocate at least the event channel for xenstore and potentially the 
> >>>> event
> >>>> channel for the core console. When dom0less and hyperlaunch are doing 
> >>>> their
> >>>> construction logic they are executing under the idle domain context. The 
> >>>> idle
> >>>> domain is not a privileged domain, it is not the target domain, and as a 
> >>>> result
> >>>> under the current default XSM policy is not allowed to allocate the event
> >>>> channel.
> >>>
> >>> I've not been following the discussion around this patch, but I would
> >>> assume such privileges are only required for init code when no other
> >>> domains are running?
> >>
> >> At this time, correct.
> >>
> >>> Since it's only at that point where the idle domain context needs to
> >>> allocate event channels would it make sense to temporary elevate it's
> >>> privileges by setting d->is_privileged while doing the domain creation?
> >>
> >> This is initially what I did but it seemed like there was some
> >> reluctance. As I was looking to formalize/abstract this in XSM instead
> >> of doing direct manipulations, I realized I could achieve it in the hook
> >> which would allow the hyperlaunch and dom0less code work without having
> >> to ensure priv escalation is properly handled.
> >>
> >>> That way we wouldn't need to grant those permissions for the lifetime
> >>> of the host when they are only needed for initialization code.
> >>
> >> Correct, which is why I adjusted the effective default policy only on
> >> the check instead of in xsm_default_action() as Jan has suggested.
> >> Outside of a code fault, all other times that evtchn_alloc_unbound() is
> >> called `current->domain` should be pointing at the caller of the hypercall.
> >>
> >> This works as an interim solution with minimal impact as it is all
> >> internal to XSM and can easily be evolved. My concern is that exposing a
> >> function call to provide priv escalation for the idle domain as an
> >> interim solution for dom0less and hyperlaunch will have more impactful
> >> code churn in both of these when a longer term approach is adopted.
> >>
> >>> Another option would be switching to the initial vCPU of the domain
> >>> being created, but that's likely to be more complex, or even create a
> >>> short lived system domain with is_privileged set just for the purpose
> >>> of building other domains.
> >>
> >> Longer term I would like to explore doing this in general. Some initial
> >> thinking is the fact that hypervisor has a few contexts, relative to
> >> external entities, under which it is executing. When it is handling
> >> internal house keeping (e.g. scheduler and security server), when it is
> >> interacting with guest domains, when it is interacting with hardware
> >> (e.g. vpci), and now when it is processing boot material to construct
> >> domains. It  has been mentioned that today in Xen if one of these
> >> contexts acting with external entities is corrupted, it can interfere
> >> with operations occurring in the other contexts. In the past the have
> >> advocated and been working to split these contexts using hard L0/L1
> >> separation. As noted in other discussions, some architectures are
> >> gaining hardware features that can be used in hard L0/L1 partitioning
> >> but also could be used in a more "soft" partitioning more a kin to
> >> Nested Kernel[1] and Dune[2]. Again just some initial thoughts.
> >>
> >>> Overall I'm not sure it's worth giving those extra privileges to the
> >>> idle domain when they are just need for a known and bounded period of
> >>> time.
> >>
> >> IMHO that is a slight over simplification. Setting is_privileged to the
> >> idle domain while it is processing domain construction data from outside
> >> the hypervisor means that during that bounded period the idle domain is
> >> complete unrestricted and may invoke any XSM protected call.
> > 
> > The domain builder code executed in the idle domain context can make
> > direct calls to any functions that are otherwise protected behind an
> > XSM check on the hypercall paths, so I don't see much difference.  The
> > code executed by the idle domain in order to do domain creation is
> > already part of the trusted code base (ie: it's hypervisor code)
> > likewise with the data used as input.
> 
> I am only referring to what legit code paths exist, not what a full
> control exploit could achieve at this time. My comments below was
> discussing what might want to be considered down the road.

Maybe I wasn't very clear on that paragraph, what I meant to say is
that the domain builder code does already bypass XSM checks that would
otherwise fail, just by calling functions that are behind the XSM
checks.

For example the domain builder will likely call
alloc_domheap_pages(target,...); which would otherwise be protected by
xsm_memory_adjust_reservation(XSM_TARGET, current, target) when
populating the domain physmap, so here you are basically bypassing an
XSM check already.

> >> Contrast
> >> this with only granting the idle domain the ability to allocate event
> >> channels between domains at any time with the only codified usage is
> >> during init/setup. While I am unsure how, theoretically malformed
> >> construction data could expose a logic flaw to do some very unsavory
> >> allocations without any guards.
> > 
> > It's kind of like that already, it's just that in other instances the
> > calls done by the domain builder in idle domain context bypass the
> > hypercall XSM checks.
> 
> Not on legitimate code paths, which is what I am focused on since the
> primary vector that I was considering here is data attacks which are
> about steering execution down legitimate paths as opposed to attacks
> like ROP that allows runtime construction of execution paths.

The internal domain builder does bypass XSM checks by calling
functions that on hypercall paths are otherwise protected by an XSM
check, see my comment above about alloc_domheap_pages() for example.

> > This might be giving you a false sense of security, but what's done in
> > the idle domain context in order to do domain creation would strictly
> > speaking require the idle domain to be a fully privileged entity, it's
> > just that we mostly bypass the XSM checks by calling functions
> > directly instead of using the hypercall entry paths.
> 
> Not at all as long as it is understood this is just about
> least-privilege with the concern being around data attacks.
> 
> To be clear, I am not saying either solution is wrong and standing
> firmly for one or the other. I am just trying to ensure that we consider
> the applicable security aspects and thus make an informed decision.
> 
> >> Whereas during runtime if the idle
> >> domain was tricked into establishing an event channel between two
> >> domains, it would only serve to provide a covert channel between the two
> >> domains. Neither is desirable but IMHO I find the former a little more
> >> concerning than the latter.
> >>
> >> With that said, I am not completely against doing the priv escalation if
> >> overall this is the direction that is preferred. If so, I would prefer
> >> to provide a pair of static inlines under XSM name space to provide a
> >> consistent implementation and be able to easily locate the places where
> >> it is applied if/when a longer term approach is implemented.
> > 
> > I think those helpers must be __init, and we need to assert that by
> > the time domains are started the idle domain is no longer
> > privileged.
> 
> That is fine, I am not sure if there is a difference between static
> inline functions that are only invoked from __init code (and I would
> think would thus not exist anywhere else in the binary) over__init
> functions.

__init functions are freed after boot, so any attempt to call them
after boot will result in a page fault. This is what we want here, as
after boot the permissions of the idle domain shouldn't ever be
increased.

> Although I have not been keeping Live Update in mind but I
> think this is where the controlled privileged builder context is really
> needed that dom0less, hyperlaunch, and Live Update can all utilize.
> 
> Good point regarding the assert. This was a concern I forgot to mention
> with using priv escalation. Once a legitimate path exists to do such an
> action, what locations should there be checks/assertions it is has not
> occurred.
> 
> > From my PoV increasing the privileges of the idle domain just for the
> > time it acts as a domain builder is merely formalizing what is already
> > a fact: if the domain building was executed outside of Xen it would
> > require the context domain to be privileged.
> 
> Except that it is now in guest space and can only reach resources
> through the controlled hypercall interface. Any data that is sent is
> considered trustworthy because the domain builder is trusted.

So if it is trusted then there's no issue in formalizing this by
setting its context to is_privileged = true.  Ideally we might want to
do domain build on a separate system domain just for the purpose of
executing that code, but given the system is not live yet (no
domains are scheduled in parallel with the idle domain) I don't see an
issue with re-using the idle domain for that purpose.

> I will send out my original priv escalation patch later today.

Thanks! Will give it a look (likely tomorrow).

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.