[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional



On 16.12.19 06:58, Tian, Kevin wrote:
From: Jürgen Groß <jgross@xxxxxxxx>
Sent: Friday, December 13, 2019 11:36 PM

On 13.12.19 15:45, Jan Beulich wrote:
On 13.12.2019 15:24, Jürgen Groß wrote:
On 13.12.19 15:11, Jan Beulich wrote:
On 13.12.2019 14:46, Jürgen Groß wrote:
On 13.12.19 14:38, Jan Beulich wrote:
On 13.12.2019 14:31, Jürgen Groß wrote:
Maybe I have misunderstood the current state, but I thought that it
would just silently hide quirky devices without imposing a security
risk. We would not learn which devices are quirky, but OTOH I doubt
we'd get many reports about those in case your patch goes in.

We don't want or need such reports, that's not the point. The
security risk comes from the quirkiness of the devices - admins
may wrongly think all is well and expose quirky devices to not
sufficiently trusted guests. (I say this fully realizing that
exposing devices to untrusted guests is almost always a certain
level of risk.)

Do we _know_ those devices are problematic from security standpoint?
Normally the IOMMU should do the isolation just fine. If it doesn't
then its not the quirky device which is problematic, but the IOMMU.

I thought the problem was that the quirky devices would not stop all
(read) DMA even when being unassigned from the guest resulting in
fatal IOMMU faults. The dummy page should stop those faults to
happen
resulting in a more stable system.

IOMMU faults by themselves are not impacting stability (they will
add processing overhead, yes). The problem, according to Paul's
description, is that the occurrence of at least some forms of IOMMU
faults (not present ones as it seems, as opposed to permission
violation ones) is fatal to certain systems. Irrespective of the
sink page used after de-assignment a guest can arrange for IOMMU
faults to occur even while it still has the device assigned. Hence
it is important for the admin to know that their system (not the
the particular device) behaves in this undesirable way.

So how does the admin learn this? Its not as if your patch would result
in a system crash or hang all the time, right? This would be the case
only if there either is a malicious (on purpose or due to a bug) guest
which gets the device assigned, or if there happens to be a pending DMA
operation when the device gets unassigned.

I didn't claim the change would cover all cases. All I am claiming
is that it increases the chances of admins becoming aware of reasons
not to pass through devices to certain guests.

So combined with your answer this means to me:

With your patch (or the original one reverted) a DoS will occur either
due to a malicious guest or in case a DMA is still pending. As a result
the admin will no longer pass this device to any untrusted guest.

With the current 4.13-staging a DoS will occur only due to a malicious
guest. The admin will then no longer pass this device to any untrusted
guest.

So right now without any untrusted guest no DoS, while possibly DoS with
your patch. How is that better?


I'd suggest separating run-time DoS from original quarantine purpose
of this patch.

For quarantine, I'm with Jan that giving admin the chance of knowing
whether quarantine is required is important. Say an admin just gets
a sample device from a new vendor and needs to decide whether his
employer should put such device in their production system. It's
essential to have a default configuration which can warn on any
possible violation of the expectations on a good assignable device.
Then the admin can look at Xen user guide to find out what the warning
information means and then does whatever required (usually means
more scrutinization than the warning itself) to figure out whether
identified problems are safe (e.g. by enabling quarantine) or are
real indicators of bogus implementation (then should not use it).
Having quarantine default on means that every admin should remember
that Xen already enables some band-aids on benign warnings so he
should explicitly turn off those options to do evaluation which, to me
is not realistic.

This implies the admin is aware of the necessity to do that testing.
And for the tests to be conclusive he probably needs to do more than
just a basic "does it work" test, as the pending DMA might occur in
some cases only. And that is basically my problem with Jan's default:
an admin not doing enough testing (or non at all) will end up with a
DoS situation in production, while an admin knowing that he needs to
test properly is probably more aware of the needed command line option
for evaluation of the device's security relevance.

Its a complex problem and I think the decision for the default should
not be rushed. So I think it is best to discuss it on xen-devel and
leave the patch out of the initial 4.13 release (this patch is the last
pending one for 4.13 now). After a decision is made the patch can easily
by backported to 4.13 in case it is regarded to be important.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.