[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional

> From: Jürgen Groß <jgross@xxxxxxxx>
> Sent: Friday, December 13, 2019 11:36 PM
> On 13.12.19 15:45, Jan Beulich wrote:
> > On 13.12.2019 15:24, Jürgen Groß wrote:
> >> On 13.12.19 15:11, Jan Beulich wrote:
> >>> On 13.12.2019 14:46, Jürgen Groß wrote:
> >>>> On 13.12.19 14:38, Jan Beulich wrote:
> >>>>> On 13.12.2019 14:31, Jürgen Groß wrote:
> >>>>>> Maybe I have misunderstood the current state, but I thought that it
> >>>>>> would just silently hide quirky devices without imposing a security
> >>>>>> risk. We would not learn which devices are quirky, but OTOH I doubt
> >>>>>> we'd get many reports about those in case your patch goes in.
> >>>>>
> >>>>> We don't want or need such reports, that's not the point. The
> >>>>> security risk comes from the quirkiness of the devices - admins
> >>>>> may wrongly think all is well and expose quirky devices to not
> >>>>> sufficiently trusted guests. (I say this fully realizing that
> >>>>> exposing devices to untrusted guests is almost always a certain
> >>>>> level of risk.)
> >>>>
> >>>> Do we _know_ those devices are problematic from security standpoint?
> >>>> Normally the IOMMU should do the isolation just fine. If it doesn't
> >>>> then its not the quirky device which is problematic, but the IOMMU.
> >>>>
> >>>> I thought the problem was that the quirky devices would not stop all
> >>>> (read) DMA even when being unassigned from the guest resulting in
> >>>> fatal IOMMU faults. The dummy page should stop those faults to
> happen
> >>>> resulting in a more stable system.
> >>>
> >>> IOMMU faults by themselves are not impacting stability (they will
> >>> add processing overhead, yes). The problem, according to Paul's
> >>> description, is that the occurrence of at least some forms of IOMMU
> >>> faults (not present ones as it seems, as opposed to permission
> >>> violation ones) is fatal to certain systems. Irrespective of the
> >>> sink page used after de-assignment a guest can arrange for IOMMU
> >>> faults to occur even while it still has the device assigned. Hence
> >>> it is important for the admin to know that their system (not the
> >>> the particular device) behaves in this undesirable way.
> >>
> >> So how does the admin learn this? Its not as if your patch would result
> >> in a system crash or hang all the time, right? This would be the case
> >> only if there either is a malicious (on purpose or due to a bug) guest
> >> which gets the device assigned, or if there happens to be a pending DMA
> >> operation when the device gets unassigned.
> >
> > I didn't claim the change would cover all cases. All I am claiming
> > is that it increases the chances of admins becoming aware of reasons
> > not to pass through devices to certain guests.
> So combined with your answer this means to me:
> With your patch (or the original one reverted) a DoS will occur either
> due to a malicious guest or in case a DMA is still pending. As a result
> the admin will no longer pass this device to any untrusted guest.
> With the current 4.13-staging a DoS will occur only due to a malicious
> guest. The admin will then no longer pass this device to any untrusted
> guest.
> So right now without any untrusted guest no DoS, while possibly DoS with
> your patch. How is that better?

I'd suggest separating run-time DoS from original quarantine purpose
of this patch.

For quarantine, I'm with Jan that giving admin the chance of knowing
whether quarantine is required is important. Say an admin just gets
a sample device from a new vendor and needs to decide whether his
employer should put such device in their production system. It's
essential to have a default configuration which can warn on any 
possible violation of the expectations on a good assignable device. 
Then the admin can look at Xen user guide to find out what the warning
information means and then does whatever required (usually means
more scrutinization than the warning itself) to figure out whether 
identified problems are safe (e.g. by enabling quarantine) or are
real indicators of bogus implementation (then should not use it).
Having quarantine default on means that every admin should remember
that Xen already enables some band-aids on benign warnings so he
should explicitly turn off those options to do evaluation which, to me
is not realistic.

On the other hand, although a sink page can help mitigate run-time DoS,
how to handle it should be an admin policy. If DoS is caused by 
malicious guest, it makes more sense to warn the admin and then switch
to sink page after meeting a DoS detection condition (e.g. number of
faults within a timing window exceeds a threshold). Lots of such warnings
from multiple VMs may imply a massive attack then the admin may adopt
more comprehensive analysis or defensive means. Then it's also possible
that DoS is caused by a vulnerability within the guest. In such case, the
guest may want to contain the DoS attack itself through a virtual IOMMU.
Then Xen needs to know the fault and then forward to the guest through
the vIOMMU. In either case I don't think current patch is sufficient. So it's
better to leave DoS improved separately (of course the sink logic can be 
leveraged), given the timeline of 4.13. 


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.