[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain



On 10.12.19 09:05, Jan Beulich wrote:
On 10.12.2019 08:16, Tian, Kevin wrote:
From: Jan Beulich <jbeulich@xxxxxxxx>
Sent: Tuesday, December 3, 2019 5:36 PM

On 28.11.2019 12:32, Jürgen Groß wrote:
On 28.11.19 12:17, Jan Beulich wrote:
On 27.11.2019 18:11, Paul Durrant wrote:
This patch introduces a new iommu_op to facilitate a per-
implementation
quarantine set up, and then further code for x86 implementations
(amd and vtd) to set up a read-only scratch page to serve as the source
for DMA reads whilst a device is assigned to dom_io. DMA writes will
continue to fault as before.

The reason for doing this is that some hardware may continue to re-try
DMA (despite FLR) in the event of an error, or even BME being cleared,
and
will fail to deal with DMA read faults gracefully. Having a scratch page
mapped will allow pending DMA reads to complete and thus such buggy
hardware will eventually be quiesced.

NOTE: These modifications are restricted to x86 implementations only as
        the buggy h/w I am aware of is only used with Xen in an x86
        environment. ARM may require similar code but, since I am not
        aware of the need, this patch does not modify any ARM
implementation.

Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>

Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

There is still the open question of whether use of a scratch page ought
to be gated on something, either are run-time or compile-time.

I have no clear opinion either way here. The workaround seems low
overhead enough that there may not be a need to have an admin (or
build time) control for this.

As to 4.13: The quarantining as a whole is pretty fresh. While it
has been backported to security maintained trees, I'd still consider
it a new feature in 4.13, and hence this workaround at least eligible
for consideration.

I agree.

Release-acked-by: Juergen Gross <jgross@xxxxxxxx>

I notice this has been committed meanwhile. I had specifically not
done so due to the still missing VT-d ack, seeing that this wasn't
an entirely "trivial" change.


While the quarantine idea sounds good overall, I'm still not convinced
to have it the only way in place just for handling some known-buggy
device. It kills the possibility of identifying a new buggy device and then
deciding not to use it in the first space... I thought about whether it
will get better when future IOMMU implements A/D bit - by checking
access bit being set then we'll know some buggy device exists, but,
the scratch page is shared by all devices then we cannot rely on this
feature to find out the actual buggy one.

Thinking about it - yes, I think I agree. This (as with so many
workarounds) would better be an off-by-default one. The main issue
I understand this would have is that buggy systems then might hang
without even having managed to get a log message out - Paul?

Jürgen - would you be amenable to an almost last minute refinement
here (would then also need to still be backported to 4.12.2, or
the original backport reverted, to avoid giving the impression of
a regression)?

So what is your suggestion here? To have a boot option (defaulting to
off) for enabling the scratch page?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.