Xen project Mailing List

Re: [Xen-devel] [PATCH v2 4/6] tools/dm_restrict: Unshare mount and IPC namespaces on Linux

To: Ian Jackson <ian.jackson@xxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Mon, 24 Sep 2018 15:22:15 +0100

Autocrypt: addr=george.dunlap@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFPqG+MBEACwPYTQpHepyshcufo0dVmqxDo917iWPslB8lauFxVf4WZtGvQSsKStHJSj 92Qkxp4CH2DwudI8qpVbnWCXsZxodDWac9c3PordLwz5/XL41LevEoM3NWRm5TNgJ3ckPA+J K5OfSK04QtmwSHFP3G/SXDJpGs+oDJgASta2AOl9vPV+t3xG6xyfa2NMGn9wmEvvVMD44Z7R W3RhZPn/NEZ5gaJhIUMgTChGwwWDOX0YPY19vcy5fT4bTIxvoZsLOkLSGoZb/jHIzkAAznug Q7PPeZJ1kXpbW9EHHaUHiCD9C87dMyty0N3TmWfp0VvBCaw32yFtM9jUgB7UVneoZUMUKeHA fgIXhJ7I7JFmw3J0PjGLxCLHf2Q5JOD8jeEXpdxugqF7B/fWYYmyIgwKutiGZeoPhl9c/7RE Bf6f9Qv4AtQoJwtLw6+5pDXsTD5q/GwhPjt7ohF7aQZTMMHhZuS52/izKhDzIufl6uiqUBge 0lqG+/ViLKwCkxHDREuSUTtfjRc9/AoAt2V2HOfgKORSCjFC1eI0+8UMxlfdq2z1AAchinU0 eSkRpX2An3CPEjgGFmu2Je4a/R/Kd6nGU8AFaE8ta0oq5BSFDRYdcKchw4TSxetkG6iUtqOO ZFS7VAdF00eqFJNQpi6IUQryhnrOByw+zSobqlOPUO7XC5fjnwARAQABzSRHZW9yZ2UgVy4g RHVubGFwIDxkdW5sYXBnQHVtaWNoLmVkdT7CwYAEEwEKACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAlpk2IEFCQo9I54ACgkQpjY8MQWQtG1A1BAAnc0oX3+M/jyv4j/ESJTO U2JhuWUWV6NFuzU10pUmMqpgQtiVEVU2QbCvTcZS1U/S6bqAUoiWQreDMSSgGH3a3BmRNi8n HKtarJqyK81aERM2HrjYkC1ZlRYG+jS8oWzzQrCQiTwn3eFLJrHjqowTbwahoiMw/nJ+OrZO /VXLfNeaxA5GF6emwgbpshwaUtESQ/MC5hFAFmUBZKAxp9CXG2ZhTP6ROV4fwhpnHaz8z+BT NQz8YwA4gkmFJbDUA9I0Cm9D/EZscrCGMeaVvcyldbMhWS+aH8nbqv6brhgbJEQS22eKCZDD J/ng5ea25QnS0fqu3bMrH39tDqeh7rVnt8Yu/YgOwc3XmgzmAhIDyzSinYEWJ1FkOVpIbGl9 uR6seRsfJmUK84KCScjkBhMKTOixWgNEQ/zTcLUsfTh6KQdLTn083Q5aFxWOIal2hiy9UyqR VQydowXy4Xx58rqvZjuYzdGDdAUlZ+D2O3Jp28ez5SikA/ZaaoGI9S1VWvQsQdzNfD2D+xfL qfd9yv7gko9eTJzv5zFr2MedtRb/nCrMTnvLkwNX4abB5+19JGneeRU4jy7yDYAhUXcI/waS /hHioT9MOjMh+DoLCgeZJYaOcgQdORY/IclLiLq4yFnG+4Ocft8igp79dbYYHkAkmC9te/2x Kq9nEd0Hg288EO/OwE0EVFq6vQEIAO2idItaUEplEemV2Q9mBA8YmtgckdLmaE0uzdDWL9To 1PL+qdNe7tBXKOfkKI7v32fe0nB4aecRlQJOZMWQRQ0+KLyXdJyHkq9221sHzcxsdcGs7X3c 17ep9zASq+wIYqAdZvr7pN9a3nVHZ4W7bzezuNDAvn4EpOf/o0RsWNyDlT6KECs1DuzOdRqD oOMJfYmtx9hMzqBoTdr6U20/KgnC/dmWWcJAUZXaAFp+3NYRCkk7k939VaUpoY519CeLrymd Vdke66KCiWBQXMkgtMGvGk5gLQLy4H3KXvpXoDrYKgysy7jeOccxI8owoiOdtbfM8TTDyWPR Ygjzb9LApA8AEQEAAcLBZQQYAQoADwIbDAUCWmTXMwUJB+tP9gAKCRCmNjwxBZC0bb+2D/9h jn1k5WcRHlu19WGuH6q0Kgm1LRT7PnnSz904igHNElMB5a7wRjw5kdNwU3sRm2nnmHeOJH8k Yj2Hn1QgX5SqQsysWTHWOEseGeoXydx9zZZkt3oQJM+9NV1VjK0bOXwqhiQyEUWz5/9l467F S/k4FJ5CHNRumvhLa0l2HEEu5pxq463HQZHDt4YE/9Y74eXOnYCB4nrYxQD/GSXEZvWryEWr eDoaFqzq1TKtzHhFgQG7yFUEepxLRUUtYsEpT6Rks2l4LCqG3hVD0URFIiTyuxJx3VC2Ta4L H3hxQtiaIpuXqq2D4z63h6vCx2wxfZc/WRHGbr4NAlB81l35Q/UHyMocVuYLj0llF0rwU4Aj iKZ5qWNSEdvEpL43fTvZYxQhDCjQTKbb38omu5P4kOf1HT7s+kmQKRtiLBlqHzK17D4K/180 ADw7a3gnmr5RumcZP3NGSSZA6jP5vNqQpNu4gqrPFWNQKQcW8HBiYFgq6SoLQQWbRxJDHvTR YJ2ms7oCe870gh4D1wFFqTLeyXiVqjddENGNaP8ZlCDw6EU82N8Bn5LXKjR1GWo2UK3CjrkH pTt3YYZvrhS2MO2EYEcWjyu6LALF/lS6z6LKeQZ+t9AdQUcILlrx9IxqXv6GvAoBLJY1jjGB q+/kRPrWXpoaQn7FXWGfMqU+NkY9enyrlw==

Cc: Anthony Perard <anthony.perard@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Mon, 24 Sep 2018 14:22:29 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 09/24/2018 11:40 AM, Ian Jackson wrote: > George Dunlap writes ("[PATCH v2 4/6] tools/dm_restrict: Unshare mount and > IPC namespaces on Linux"): >> QEMU running under Xen doesn't need mount or IPC functionality. >> Create and enter separate namespaces for each of these before >> executing QEMU, so that in the event that other restrictions fail, the >> process won't be able to even name system mount points or exsting >> non-file-based IPC descriptors to attempt to attack them. > >> Unsharing is something a process can only do to itself (it would >> seem); so add an os-specific "dm_preexec_restrict()" hook just before >> we exec() the device model. > ... >> + >> +void libxl__local_dm_preexec_restrict(libxl__gc *gc, int stderrfd) >> +{ >> + return; >> +} > > This patch is the point of divergence between Linux dom0's and FreeBSD > dom0's. > > IMO it's not good enough to simply talk about some restrictions as > being Linux-only. Rather, the restrictions are supposed to work > together as a whole, and the whole syscall API available to qemu needs > to be considered. That would need to be done separately for FreeBSD. > > I think that this means we should explicitly write down that the qemu > depriv implementation is incomplete on FreeBSD. I think theoretically, an unprivileged user locked in a chroot owned by root, and with the xenstore fds properly restricted, should be enough to prevent a guest from gaining control or reading data that it shouldn't be able to read. All of the other restrictions are to add extra layers, in case that first layer should have a bug. That is, our unprivileged process shouldn't be *able* to call mount; but, *just in case*, we'll put it in a separate mount space, so that it can't mount proc or sysfs or whatever. Our unprivileged process shouldn't be able to use non-file-based IPC to break into other processes on the system; but just in case, we'll put it in its own IPC namespace. Our unpriveleged process shouldn't be *able* to cause any mischief with the reboot or readdir system calls, but let's restrict those with `-sandbox` anyway, just in case. `-runas` and `-chroot` should work just fine on FreeBSD; and if the restrict IOCTL is implemented, then that should be enough to say, "At a basic level this works". We probably do want someone familiar with FreeBSD to go through and implement whatever can be implemented there as far as additional restrictions. > >> +/* >> + * Called after forking but before executing the local devicemodel. >> + * Must call _exit(-1) on failure, printing an error message to >> + * stderrfd if available. > > IMO it would be better if the single call site handled the errors. I > think that can be done in a standard way. So this function should > probably be expected to call a LOG macro and then return ERROR_FAIL, > on failure. Ack >> +void libxl__local_dm_preexec_restrict(libxl__gc *gc, int stderrfd) >> +{ >> + int rc; >> + >> + /* Unshare mount and IPC namespaces. These are unused by QEMU. */ >> + rc = unshare(CLONE_NEWNS | CLONE_NEWIPC); > > Variable name should be r. rc is used for libxl error codes. > See tools/libxl/CODING_STYLE. Yes, sorry I forgot about this. Ack (to this and all similar reminders). >> +# TEST: Namespace unsharing >> +# >> +# Read /proc/<dmpid>/ns/<namespace> and make sure it's not equal to >> +# the current processes' value >> +for nsname in ipc mnt; do >> + echo -n "Unshare namespace $nsname: " >> + dmns=$(readlink /proc/$dmpid/ns/$nsname) >> + myns=$(readlink /proc/self/ns/$nsname) >> + >> + if [[ "$dmns" == "$myns" ]] ; then >> + echo "FAILED" >> + failed="true" >> + else >> + echo "PASSED" >> + fi >> +done > > I wonder if maybe this output should be in subunit v1 format. We could do that. It doesn't look like the "official" subunit code [1] accumulates the failure and returns it as the status code of the script as a whole. Is that something that's valuable, do you think? Or should we have `status = OK` mean, "The script itself ran without errors, check the output for any test failures"? -George [1] https://github.com/testing-cabal/subunit/blob/master/shell/share/subunit.sh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.