[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata



>>> On 05.03.13 at 12:59, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 05/03/13 07:45, Jan Beulich wrote:
>> Just like for the first version of this patch (where you didn't really
>> follow up on all the comment made), and in line with the XSA-36
>> follow up that you also asked upon some time last week, I don't
>> think we can take this as is. First and foremost we need to settle
>> on a policy:
>>
>> - fully disable IOMMU
>>   - stay with allowing PV passthrough in this mode
>>   - also suppress PV passthrough in this mode
>> - partially disable IOMMU in presence of platform errata
>>   (yielding an unexpectedly - to the user - insecure system)
>> - other options?
> 
> There is only one satisfactory option, and this is to provide a report
> to the toolstack of what hardware is available/missing/buggy.  The
> toolstack can then make an informed decision as to whether to use
> passthrough or not.

Fine with me, but I don't think I've seen a patch to this effect.

> The specific usecase we have in mind going forward is with driver
> domains.  Even without interrupt remapping, it is safe to pass devices
> through to trusted code base driver domains.  A binary switch on the Xen
> command line is not acceptable nor appropriate in this situation.

Makes sense, albeit I'd question whether "trusted code base" is
enough here (because of bugs that may still affect the whole
system).

>> This specifically needs to be distinguished from firmware
>> disabling/hiding some functionality - we ought to be permitted
>> to expect the operator to know the security level that the
>> platform provides.
> 
> Why? This patch has the *exactly the same effect* as a BIOS upgrade
> which follows the erratum recommendations.

But this difference is - as said - that it's still a firmware decision
then, and my statement on the operator being expected to be
aware holds.

Plus the political aspect: We're going to be held responsible (with
"we" being xen.org or individual vendors) for the security breach
if we disable a security relevant function of the hardware.
Whereas when the BIOS does so, it's the BIOS vendor's liability.

>  The difference being that it
> is at least tweakable on the Xen command line.  The real problem with
> this patch is because of how common these chipsets are, and the fact
> that no BIOSes we are aware of have implemented the erratum workaround.
> 
> This bug is the root cause of at least 7 customer escalations of weird
> crashes, and suspected cause of 5 more.  Customers do not need to be
> actively using PCI passthrough to have this issue screw with dom0.

And I didn't say I don't want a workaround. What I said is that
the way it is being implemented is imo not suitable.

>> My personal opinion is that in the event where we need to disable
>> _any_ IOMMU functionality, we ought to _fully_ suppress
>> passthrough (and hence we can as well disable the IOMMU
>> altogether, as we did for XSA-36). We can _then_ allow the user
>> to re-enable the IOMMU (in the case here as much as for XSA-36
>> e.g. via "iommu=no-intremap", i.e. the operator explicitly
>> declaring to be willing to take the risk). That would require parts
>> of the XSA-36 follow up patch that I had posted (other parts of it
>> would need adjustment).
> 
> See the previously device driver usercase.
> 
> Also, this is unacceptable for any business usecase of Xen.  I realize
> that this is our problem rather than upstreams, but Citrix is certainly
> not alone in this regard.  We cannot push a bugfix or security which
> regresses functionality, which is why XSA-36 is not currently in an
> acceptable state.  It is the same reasoning as your commit for c/s
> 25765:e6ca45ca03c2

Not really. The way that one is done is not sacrificing security
for functionality, at least not by default. It merely allows an
operator to decide to do so.

Of course, here we're talking about trading security for
stability, and one would generally expect stability to come
first, then security, then functionality. But then again that's
not the question here, as we're not discussing whether we
need a fix (workaround), but how to implement it. And there
clearly are ways to implement the workaround in a secure
fashion - see above.

>> And - do you really need to iterate over all buses on segment 0?
>> The X58 data sheet says at the top of section 17.1: "All devices
>> on the IOH reside on bus 0". I wonder whether you wouldn't
>> instead need to do this over all segments, on each bus 0.
> 
> The chipsets do not support multi-segment systems, and we have a
> multi-socket affected systems with multiple of these chipsets, with none
> of the IOH's on bus 0.

That's contrary to the spec then, and will need clarification.
Don, Xiantao?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.