[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional



On 16/12/2019 08:24, Jürgen Groß wrote:
> On 16.12.19 06:58, Tian, Kevin wrote:
>>> From: Jürgen Groß <jgross@xxxxxxxx>
>>> Sent: Friday, December 13, 2019 11:36 PM
>>>
>>> On 13.12.19 15:45, Jan Beulich wrote:
>>>> On 13.12.2019 15:24, Jürgen Groß wrote:
>>>>> On 13.12.19 15:11, Jan Beulich wrote:
>>>>>> On 13.12.2019 14:46, Jürgen Groß wrote:
>>>>>>> On 13.12.19 14:38, Jan Beulich wrote:
>>>>>>>> On 13.12.2019 14:31, Jürgen Groß wrote:
>>>>>>>>> Maybe I have misunderstood the current state, but I thought that it
>>>>>>>>> would just silently hide quirky devices without imposing a security
>>>>>>>>> risk. We would not learn which devices are quirky, but OTOH I doubt
>>>>>>>>> we'd get many reports about those in case your patch goes in.
>>>>>>>>
>>>>>>>> We don't want or need such reports, that's not the point. The
>>>>>>>> security risk comes from the quirkiness of the devices - admins
>>>>>>>> may wrongly think all is well and expose quirky devices to not
>>>>>>>> sufficiently trusted guests. (I say this fully realizing that
>>>>>>>> exposing devices to untrusted guests is almost always a certain
>>>>>>>> level of risk.)
>>>>>>>
>>>>>>> Do we _know_ those devices are problematic from security standpoint?
>>>>>>> Normally the IOMMU should do the isolation just fine. If it doesn't
>>>>>>> then its not the quirky device which is problematic, but the IOMMU.
>>>>>>>
>>>>>>> I thought the problem was that the quirky devices would not stop all
>>>>>>> (read) DMA even when being unassigned from the guest resulting in
>>>>>>> fatal IOMMU faults. The dummy page should stop those faults to
>>> happen
>>>>>>> resulting in a more stable system.
>>>>>>
>>>>>> IOMMU faults by themselves are not impacting stability (they will
>>>>>> add processing overhead, yes). The problem, according to Paul's
>>>>>> description, is that the occurrence of at least some forms of IOMMU
>>>>>> faults (not present ones as it seems, as opposed to permission
>>>>>> violation ones) is fatal to certain systems. Irrespective of the
>>>>>> sink page used after de-assignment a guest can arrange for IOMMU
>>>>>> faults to occur even while it still has the device assigned. Hence
>>>>>> it is important for the admin to know that their system (not the
>>>>>> the particular device) behaves in this undesirable way.
>>>>>
>>>>> So how does the admin learn this? Its not as if your patch would result
>>>>> in a system crash or hang all the time, right? This would be the case
>>>>> only if there either is a malicious (on purpose or due to a bug) guest
>>>>> which gets the device assigned, or if there happens to be a pending DMA
>>>>> operation when the device gets unassigned.
>>>>
>>>> I didn't claim the change would cover all cases. All I am claiming
>>>> is that it increases the chances of admins becoming aware of reasons
>>>> not to pass through devices to certain guests.
>>>
>>> So combined with your answer this means to me:
>>>
>>> With your patch (or the original one reverted) a DoS will occur either
>>> due to a malicious guest or in case a DMA is still pending. As a result
>>> the admin will no longer pass this device to any untrusted guest.
>>>
>>> With the current 4.13-staging a DoS will occur only due to a malicious
>>> guest. The admin will then no longer pass this device to any untrusted
>>> guest.
>>>
>>> So right now without any untrusted guest no DoS, while possibly DoS with
>>> your patch. How is that better?
>>>
>>
>> I'd suggest separating run-time DoS from original quarantine purpose
>> of this patch.
>>
>> For quarantine, I'm with Jan that giving admin the chance of knowing
>> whether quarantine is required is important. Say an admin just gets
>> a sample device from a new vendor and needs to decide whether his
>> employer should put such device in their production system. It's
>> essential to have a default configuration which can warn on any
>> possible violation of the expectations on a good assignable device.
>> Then the admin can look at Xen user guide to find out what the warning
>> information means and then does whatever required (usually means
>> more scrutinization than the warning itself) to figure out whether
>> identified problems are safe (e.g. by enabling quarantine) or are
>> real indicators of bogus implementation (then should not use it).
>> Having quarantine default on means that every admin should remember
>> that Xen already enables some band-aids on benign warnings so he
>> should explicitly turn off those options to do evaluation which, to me
>> is not realistic.
> 
> This implies the admin is aware of the necessity to do that testing.
> And for the tests to be conclusive he probably needs to do more than
> just a basic "does it work" test, as the pending DMA might occur in
> some cases only. And that is basically my problem with Jan's default:
> an admin not doing enough testing (or non at all) will end up with a
> DoS situation in production, while an admin knowing that he needs to
> test properly is probably more aware of the needed command line option
> for evaluation of the device's security relevance.
> 
> Its a complex problem and I think the decision for the default should
> not be rushed. So I think it is best to discuss it on xen-devel and
> leave the patch out of the initial 4.13 release (this patch is the last
> pending one for 4.13 now). After a decision is made the patch can easily
> by backported to 4.13 in case it is regarded to be important.
> 
> 
> Juergen
> 
> 
Isn't it possible to quarantine by default, but still detect if there is
any (DMA) activity on that device and (perhaps after a certain threshold
or short time) print a big fat (once or rate limited) warning in Xen
logs that the device could be rogue and should be checked upon by an admin ?

--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.