[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional
On 16/12/2019 08:24, Jürgen Groß wrote: > On 16.12.19 06:58, Tian, Kevin wrote: >>> From: Jürgen Groß <jgross@xxxxxxxx> >>> Sent: Friday, December 13, 2019 11:36 PM >>> >>> On 13.12.19 15:45, Jan Beulich wrote: >>>> On 13.12.2019 15:24, Jürgen Groß wrote: >>>>> On 13.12.19 15:11, Jan Beulich wrote: >>>>>> On 13.12.2019 14:46, Jürgen Groß wrote: >>>>>>> On 13.12.19 14:38, Jan Beulich wrote: >>>>>>>> On 13.12.2019 14:31, Jürgen Groß wrote: >>>>>>>>> Maybe I have misunderstood the current state, but I thought that it >>>>>>>>> would just silently hide quirky devices without imposing a security >>>>>>>>> risk. We would not learn which devices are quirky, but OTOH I doubt >>>>>>>>> we'd get many reports about those in case your patch goes in. >>>>>>>> >>>>>>>> We don't want or need such reports, that's not the point. The >>>>>>>> security risk comes from the quirkiness of the devices - admins >>>>>>>> may wrongly think all is well and expose quirky devices to not >>>>>>>> sufficiently trusted guests. (I say this fully realizing that >>>>>>>> exposing devices to untrusted guests is almost always a certain >>>>>>>> level of risk.) >>>>>>> >>>>>>> Do we _know_ those devices are problematic from security standpoint? >>>>>>> Normally the IOMMU should do the isolation just fine. If it doesn't >>>>>>> then its not the quirky device which is problematic, but the IOMMU. >>>>>>> >>>>>>> I thought the problem was that the quirky devices would not stop all >>>>>>> (read) DMA even when being unassigned from the guest resulting in >>>>>>> fatal IOMMU faults. The dummy page should stop those faults to >>> happen >>>>>>> resulting in a more stable system. >>>>>> >>>>>> IOMMU faults by themselves are not impacting stability (they will >>>>>> add processing overhead, yes). The problem, according to Paul's >>>>>> description, is that the occurrence of at least some forms of IOMMU >>>>>> faults (not present ones as it seems, as opposed to permission >>>>>> violation ones) is fatal to certain systems. Irrespective of the >>>>>> sink page used after de-assignment a guest can arrange for IOMMU >>>>>> faults to occur even while it still has the device assigned. Hence >>>>>> it is important for the admin to know that their system (not the >>>>>> the particular device) behaves in this undesirable way. >>>>> >>>>> So how does the admin learn this? Its not as if your patch would result >>>>> in a system crash or hang all the time, right? This would be the case >>>>> only if there either is a malicious (on purpose or due to a bug) guest >>>>> which gets the device assigned, or if there happens to be a pending DMA >>>>> operation when the device gets unassigned. >>>> >>>> I didn't claim the change would cover all cases. All I am claiming >>>> is that it increases the chances of admins becoming aware of reasons >>>> not to pass through devices to certain guests. >>> >>> So combined with your answer this means to me: >>> >>> With your patch (or the original one reverted) a DoS will occur either >>> due to a malicious guest or in case a DMA is still pending. As a result >>> the admin will no longer pass this device to any untrusted guest. >>> >>> With the current 4.13-staging a DoS will occur only due to a malicious >>> guest. The admin will then no longer pass this device to any untrusted >>> guest. >>> >>> So right now without any untrusted guest no DoS, while possibly DoS with >>> your patch. How is that better? >>> >> >> I'd suggest separating run-time DoS from original quarantine purpose >> of this patch. >> >> For quarantine, I'm with Jan that giving admin the chance of knowing >> whether quarantine is required is important. Say an admin just gets >> a sample device from a new vendor and needs to decide whether his >> employer should put such device in their production system. It's >> essential to have a default configuration which can warn on any >> possible violation of the expectations on a good assignable device. >> Then the admin can look at Xen user guide to find out what the warning >> information means and then does whatever required (usually means >> more scrutinization than the warning itself) to figure out whether >> identified problems are safe (e.g. by enabling quarantine) or are >> real indicators of bogus implementation (then should not use it). >> Having quarantine default on means that every admin should remember >> that Xen already enables some band-aids on benign warnings so he >> should explicitly turn off those options to do evaluation which, to me >> is not realistic. > > This implies the admin is aware of the necessity to do that testing. > And for the tests to be conclusive he probably needs to do more than > just a basic "does it work" test, as the pending DMA might occur in > some cases only. And that is basically my problem with Jan's default: > an admin not doing enough testing (or non at all) will end up with a > DoS situation in production, while an admin knowing that he needs to > test properly is probably more aware of the needed command line option > for evaluation of the device's security relevance. > > Its a complex problem and I think the decision for the default should > not be rushed. So I think it is best to discuss it on xen-devel and > leave the patch out of the initial 4.13 release (this patch is the last > pending one for 4.13 now). After a decision is made the patch can easily > by backported to 4.13 in case it is regarded to be important. > > > Juergen > > Isn't it possible to quarantine by default, but still detect if there is any (DMA) activity on that device and (perhaps after a certain threshold or short time) print a big fat (once or rate limited) warning in Xen logs that the device could be rogue and should be checked upon by an admin ? -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |