[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional

> From: Tian, Kevin
> Sent: Monday, December 16, 2019 1:58 PM
> > From: Jürgen Groß <jgross@xxxxxxxx>
> > Sent: Friday, December 13, 2019 11:36 PM
> >
> > On 13.12.19 15:45, Jan Beulich wrote:
> > > On 13.12.2019 15:24, Jürgen Groß wrote:
> > >> On 13.12.19 15:11, Jan Beulich wrote:
> > >>> On 13.12.2019 14:46, Jürgen Groß wrote:
> > >>>> On 13.12.19 14:38, Jan Beulich wrote:
> > >>>>> On 13.12.2019 14:31, Jürgen Groß wrote:
> > >>>>>> Maybe I have misunderstood the current state, but I thought that it
> > >>>>>> would just silently hide quirky devices without imposing a security
> > >>>>>> risk. We would not learn which devices are quirky, but OTOH I
> doubt
> > >>>>>> we'd get many reports about those in case your patch goes in.
> > >>>>>
> > >>>>> We don't want or need such reports, that's not the point. The
> > >>>>> security risk comes from the quirkiness of the devices - admins
> > >>>>> may wrongly think all is well and expose quirky devices to not
> > >>>>> sufficiently trusted guests. (I say this fully realizing that
> > >>>>> exposing devices to untrusted guests is almost always a certain
> > >>>>> level of risk.)
> > >>>>
> > >>>> Do we _know_ those devices are problematic from security
> standpoint?
> > >>>> Normally the IOMMU should do the isolation just fine. If it doesn't
> > >>>> then its not the quirky device which is problematic, but the IOMMU.
> > >>>>
> > >>>> I thought the problem was that the quirky devices would not stop all
> > >>>> (read) DMA even when being unassigned from the guest resulting in
> > >>>> fatal IOMMU faults. The dummy page should stop those faults to
> > happen
> > >>>> resulting in a more stable system.
> > >>>
> > >>> IOMMU faults by themselves are not impacting stability (they will
> > >>> add processing overhead, yes). The problem, according to Paul's
> > >>> description, is that the occurrence of at least some forms of IOMMU
> > >>> faults (not present ones as it seems, as opposed to permission
> > >>> violation ones) is fatal to certain systems. Irrespective of the
> > >>> sink page used after de-assignment a guest can arrange for IOMMU
> > >>> faults to occur even while it still has the device assigned. Hence
> > >>> it is important for the admin to know that their system (not the
> > >>> the particular device) behaves in this undesirable way.
> > >>
> > >> So how does the admin learn this? Its not as if your patch would result
> > >> in a system crash or hang all the time, right? This would be the case
> > >> only if there either is a malicious (on purpose or due to a bug) guest
> > >> which gets the device assigned, or if there happens to be a pending DMA
> > >> operation when the device gets unassigned.
> > >
> > > I didn't claim the change would cover all cases. All I am claiming
> > > is that it increases the chances of admins becoming aware of reasons
> > > not to pass through devices to certain guests.
> >
> > So combined with your answer this means to me:
> >
> > With your patch (or the original one reverted) a DoS will occur either
> > due to a malicious guest or in case a DMA is still pending. As a result
> > the admin will no longer pass this device to any untrusted guest.
> >
> > With the current 4.13-staging a DoS will occur only due to a malicious
> > guest. The admin will then no longer pass this device to any untrusted
> > guest.
> >
> > So right now without any untrusted guest no DoS, while possibly DoS with
> > your patch. How is that better?
> >
> I'd suggest separating run-time DoS from original quarantine purpose
> of this patch.
> For quarantine, I'm with Jan that giving admin the chance of knowing
> whether quarantine is required is important. Say an admin just gets
> a sample device from a new vendor and needs to decide whether his
> employer should put such device in their production system. It's
> essential to have a default configuration which can warn on any
> possible violation of the expectations on a good assignable device.
> Then the admin can look at Xen user guide to find out what the warning
> information means and then does whatever required (usually means
> more scrutinization than the warning itself) to figure out whether
> identified problems are safe (e.g. by enabling quarantine) or are
> real indicators of bogus implementation (then should not use it).
> Having quarantine default on means that every admin should remember
> that Xen already enables some band-aids on benign warnings so he
> should explicitly turn off those options to do evaluation which, to me
> is not realistic.
> On the other hand, although a sink page can help mitigate run-time DoS,
> how to handle it should be an admin policy. If DoS is caused by
> malicious guest, it makes more sense to warn the admin and then switch
> to sink page after meeting a DoS detection condition (e.g. number of
> faults within a timing window exceeds a threshold). Lots of such warnings
> from multiple VMs may imply a massive attack then the admin may adopt
> more comprehensive analysis or defensive means. Then it's also possible
> that DoS is caused by a vulnerability within the guest. In such case, the
> guest may want to contain the DoS attack itself through a virtual IOMMU.
> Then Xen needs to know the fault and then forward to the guest through
> the vIOMMU. In either case I don't think current patch is sufficient. So it's
> better to leave DoS improved separately (of course the sink logic can be
> leveraged), given the timeline of 4.13.

btw I forgot that today we already have some-level of DoS detection. 
You can check pci_check_disable_device, which exactly does above
threshold check and then disable the PCI_COMMAND register. The 
missing part might be, the device still issues DMAs even after the
disabling. In such case, at least VT-d allows the software to disable fault
reporting on a given device. Such method betters suits warn-and-disable
model upon DoS attacking than blindly setting up a sink page.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.