[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional
On 16.12.19 06:58, Tian, Kevin wrote: From: Jürgen Groß <jgross@xxxxxxxx> Sent: Friday, December 13, 2019 11:36 PM On 13.12.19 15:45, Jan Beulich wrote:On 13.12.2019 15:24, Jürgen Groß wrote:On 13.12.19 15:11, Jan Beulich wrote:On 13.12.2019 14:46, Jürgen Groß wrote:On 13.12.19 14:38, Jan Beulich wrote:On 13.12.2019 14:31, Jürgen Groß wrote:Maybe I have misunderstood the current state, but I thought that it would just silently hide quirky devices without imposing a security risk. We would not learn which devices are quirky, but OTOH I doubt we'd get many reports about those in case your patch goes in.We don't want or need such reports, that's not the point. The security risk comes from the quirkiness of the devices - admins may wrongly think all is well and expose quirky devices to not sufficiently trusted guests. (I say this fully realizing that exposing devices to untrusted guests is almost always a certain level of risk.)Do we _know_ those devices are problematic from security standpoint? Normally the IOMMU should do the isolation just fine. If it doesn't then its not the quirky device which is problematic, but the IOMMU. I thought the problem was that the quirky devices would not stop all (read) DMA even when being unassigned from the guest resulting in fatal IOMMU faults. The dummy page should stop those faults tohappenresulting in a more stable system.IOMMU faults by themselves are not impacting stability (they will add processing overhead, yes). The problem, according to Paul's description, is that the occurrence of at least some forms of IOMMU faults (not present ones as it seems, as opposed to permission violation ones) is fatal to certain systems. Irrespective of the sink page used after de-assignment a guest can arrange for IOMMU faults to occur even while it still has the device assigned. Hence it is important for the admin to know that their system (not the the particular device) behaves in this undesirable way.So how does the admin learn this? Its not as if your patch would result in a system crash or hang all the time, right? This would be the case only if there either is a malicious (on purpose or due to a bug) guest which gets the device assigned, or if there happens to be a pending DMA operation when the device gets unassigned.I didn't claim the change would cover all cases. All I am claiming is that it increases the chances of admins becoming aware of reasons not to pass through devices to certain guests.So combined with your answer this means to me: With your patch (or the original one reverted) a DoS will occur either due to a malicious guest or in case a DMA is still pending. As a result the admin will no longer pass this device to any untrusted guest. With the current 4.13-staging a DoS will occur only due to a malicious guest. The admin will then no longer pass this device to any untrusted guest. So right now without any untrusted guest no DoS, while possibly DoS with your patch. How is that better?I'd suggest separating run-time DoS from original quarantine purpose of this patch. For quarantine, I'm with Jan that giving admin the chance of knowing whether quarantine is required is important. Say an admin just gets a sample device from a new vendor and needs to decide whether his employer should put such device in their production system. It's essential to have a default configuration which can warn on any possible violation of the expectations on a good assignable device. Then the admin can look at Xen user guide to find out what the warning information means and then does whatever required (usually means more scrutinization than the warning itself) to figure out whether identified problems are safe (e.g. by enabling quarantine) or are real indicators of bogus implementation (then should not use it). Having quarantine default on means that every admin should remember that Xen already enables some band-aids on benign warnings so he should explicitly turn off those options to do evaluation which, to me is not realistic. This implies the admin is aware of the necessity to do that testing. And for the tests to be conclusive he probably needs to do more than just a basic "does it work" test, as the pending DMA might occur in some cases only. And that is basically my problem with Jan's default: an admin not doing enough testing (or non at all) will end up with a DoS situation in production, while an admin knowing that he needs to test properly is probably more aware of the needed command line option for evaluation of the device's security relevance. Its a complex problem and I think the decision for the default should not be rushed. So I think it is best to discuss it on xen-devel and leave the patch out of the initial 4.13 release (this patch is the last pending one for 4.13 now). After a decision is made the patch can easily by backported to 4.13 in case it is regarded to be important. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |