[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] xen/arm: fix gnttab_need_iommu_mapping



On Mon, 8 Feb 2021, Julien Grall wrote:
> On Mon, 8 Feb 2021 at 20:24, Stefano Stabellini <sstabellini@xxxxxxxxxx> 
> wrote:
> > > @Ian, I think this wants to go in 4.15. Without it, Xen may receive an 
> > > IOMMU
> > > fault for DMA transaction using granted page.
> > >
> > > > Backport: 4.12+
> > > >
> > > > ---
> > > >
> > > > Given the severity of the bug, I would like to request this patch to be
> > > > backported to 4.12 too, even if 4.12 is security-fixes only since Oct
> > > > 2020.
> > >
> > > I would agree that the bug is bad, but it is not clear to me why this 
> > > would be
> > > warrant for an exception for backporting. Can you outline what's the worse
> > > that can happen?
> > >
> > > Correct me if I am wrong, if one can hit this error, then it should be 
> > > pretty
> > > reliable. Therefore, anyone wanted to use 4.12 in production should have 
> > > seen
> > > if the error on there setup by now (4.12 has been out for nearly two 
> > > years).
> > > If not, then they are most likely not affected.
> > >
> > > Any new users of Xen should use the latest stable rather than starting 
> > > with an
> > > old version.
> >
> > Yes, the bug reproduces reliably but it takes more than a smoke test to
> > find it. That's why it wasn't found by OSSTest and also our internal
> > CI-loop at Xilinx.
> 
> Ok. So a user should be able to catch it during testing, is that correct?

Yes, probably. The failure is that PV drivers do not work (they trigger
the IOMMU fault), specifically PV network and block, maybe others too.

I think it is unlikely but possible that an hardware update would also
trigger the bug. For instance, a change of the network card might
trigger the bug, if the previous network card driver was always bouncing
requests on bounce buffers, while the new drivers uses the provided
memory pages directly. I don't know how realistic this scenario is.


> > Users can be very slow at upgrading, so I am worried that 4.12 might still
> > be picked by somebody, especially given that it is still security
> > supported for a while.
> 
> Don't tell me about upgrading Xen... ;) But I am a bit confused, are
> you worried about existing users or new users?

I am mostly worried about people that start using 4.12.

If a user was already on 4.12 and not seeing any errors, they are
unlikely to see this error. It would only happen if:
- they didn't use PV drivers before, and they want to start using PV
  drivers now
- they are upgrading hardware (not sure how likely to happen, see above)


> > > Other than the seriousness of the bug, I think there is also a fairness
> > > concern.
> > >
> > > So far our rules says there is only security support backport allowed. If 
> > > we
> > > start granting exception, then we need a way to prevent abuse of it. To 
> > > take
> > > an extreme example, why one couldn't ask backport for 4.2?
> > >
> > > That said, I vaguely remember this topic was brought up a few time on
> > > security@. So maybe it is time to have a new discussion about stable tree.
> >
> > I wouldn't consider a backport for a tree that is closed even for
> > security backports. So in your example, I'd say no to a backport to 4.2
> > or 4.10.
> >
> > I think there is a valid question for trees that are still open to
> > security fixes but not general backports.
> >
> > For these cases, I would just follow a simple rule of thumb:
> 
> Aren't those rules already used for stable trees?

No, I don't think so. Backports are done by Jan and me, not by the
submitter (in this case I am the submitter but it is a coincidence :-)
If a commit is fixing a genuine bug and the backport doesn't cause
issues, then it is typically done. Here the bar should be certainly
higher, both in terms of low-risk, and importance of the bug to fix.



> > - is the submitter willing to provide the backport?
> > - is the backport low-risk?
> > - is the underlying bug important?
> 
> You wrote multiple times that this is serious but it is still not
> clear what's the worse that can happen...

PV drivers don't work: each data transfer involving granted pages causes
an IOMMU fault.


> > If the answer to all is "yes" then I'd go with it.
> >
> >
> > In this case, given that the fix is a one-liner, and obviously correct,
> 
> I have seen one-liners that introduced XSA in the past ;).
 
Sure but this is a revert to the pre-4.12 implementation.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.