[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 21 Oct 2021 09:21:20 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nXc74orlkXJHT433PyHuiu8At67cdLtHOKJoZZ21Jvs=; b=VqD+z2Usrlyun+nouRkC+x8ZgsImB2aJkasw0/XxLfva7/UYgtYBjJkcR3DhV9PHBzSkg6yTCPXGT/Zlzw/EuU5kPp2DQq5D/GBMBP5kKrdUI90Qu6pbgh0kvqCKPPYV0ox9q4JOT8mo1zB3xx3dTokYqp9GPx7syTDxDcMDPrz108ESgABFy60rBtaks5VOE+aeeYvNZVWLCmX8ZX7I9ztjYCQ5j6u8MRaAl9ucFWjMaW2EqA0tR02z3MZmJqSUd2zX3HnTZJFc8/r6yF8oMVqkrRMsEBmvfrfcKLsBURd90uJY0zUEIoXcgcZOtJe0eiZ98+Y0pDZIHdO1DB4YTA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ksx1CBZLf5qM0mdHuGSec3fPG5M4rwIwgThy2sAuAGOl+/pjS42Y+J6kC81Xdd647qsy4GJwrLCIYFC6BiLS9+31fa8TuHgJKRMjQmwZLArUydXCCL+IPdpMoHLiz8sTskW/Mw0xfIrJSDfB5teRLAABYVXCIIlaB6NqUbj8Th2u2MXKiUieQnVhYrOb0uWEftZM+towYsv7oBLKdqCVKmDJIWYwjZDTj5tp8GehJnrhanAUmcBGILTi8Kw//0f0dz/q4APQrXYfFSTq3GmOP/RJtY/RERh9VON+DkSGzKNnpLWtr7iy4KFAyXGWWIJ9sCFjkROsp1NC5OT52FqBhA==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 21 Oct 2021 07:21:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.10.2021 22:01, Andrew Cooper wrote:
> On 20/10/2021 11:36, Jan Beulich wrote:
>> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
>> mode (physical vs clustered) depends on iommu_intremap, that variable
>> needs to be set to off as soon as we know we can't / won't enable the
>> IOMMU, i.e. in particular when
>> - parsing of the respective ACPI tables failed,
>> - "iommu=off" is in effect, but not "iommu=no-intremap".
>> Move the turning off of iommu_intremap from AMD specific code into
>> acpi_iommu_init(), accompanying it by clearing of iommu_enable.
>>
>> Take the opportunity and also skip ACPI table parsing altogether when
>> "iommu=off" is in effect anyway.
>>
>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> I've deliberately not added a Fixes: tag here, as I'm of the opinion
>> that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only
>> uncovered a pre-existing anomaly.
> 
> I agree it uncovered a pre-existing issue, but that doesn't mean a Fixes
> tag should be omitted.  That change very concretely regressed booting on
> some systems in their pre-existing configuration.
> 
> The commit message needs to spell out a link to d8bd82327b0f, but it's
> fine to say "that commit broke it by violating an unexpected ordering
> dependency, but isn't really the source of the bug".
> 
>> This particularly applies to the "iommu=off" aspect.
> 
> There should be at least two Fixes tags, but I suspect trying to trace
> the history of this mess is not a productive use of time.
> 
>>  (This now allows me to remove an item from my TODO
>> list: I was meaning to figure out why one of my systems wouldn't come
>> up properly with "iommu=off", and I had never thought of combining this
>> with "no-intremap".)
>>
>> Arguably "iommu=off" should turn off subordinate features in common
>> IOMMU code, but doing so in parse_iommu_param() would be wrong (as
>> there might be multiple "iommu=" to parse). This could be placed in
>> iommu_supports_x2apic(), but see the next item.
> 
> I don't think we make any claim or implication that passing the same
> option twice works.  The problem here is the nested structure of
> options, and the variable doing double duty.
> 
>>
>> While the change here deals with apic_x2apic_probe() as called from
>> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
>> similarly affected. That call occurs before acpi_boot_init(), which is
>> what calls acpi_iommu_init(). The ordering in setup.c is in part
>> relatively fragile, which is why for the moment I'm still hesitant to
>> move the generic_apic_probe() call down. Plus I don't have easy access
>> to a suitable system to test this case. Thoughts?
> 
> I've written these thoughts before, but IOMMU handling it a catastrophic
> mess.  It needs burning to the ground and redoing from scratch.
> 
> We currently have two ways of turning on the IOMMU, depending on what
> firmware does, and plenty ways of crashing Xen with cmdline options
> which should work, and the legacy xAPIC startup routine is after
> interrupts have been enabled, leading to an attempt to rewrite
> interrupts in place to remap them.  This in particular has lead to XSAs
> due to trusting registers which can't be trusted, and the rewrite is
> impossible to do safely.
> 
> The correct order is:
> 1) Parse DMAR/IVRS (but do not configure anything), MADT, current APIC
> setting and cmdline arguments
> 2) Figure out whether we want to be in xAPIC or x2APIC mode, and whether
> we need intremap.  Change the LAPIC setting
> 3) Configure the IOMMUs.  In particular, their interrupt needs to be
> after step 2

Leaving aside check_x2apic_preenabled(), all of this is already the
case afaict, almost at least: We do acpi_boot_init(), later
x2apic_bsp_setup(), and yet later iommu_setup(). The only issue might
be inside x2apic_bsp_setup(), where we call iommu_enable_x2apic()
before switching to x2APIC mode. Yet we avoid setting up IOMMU
interrupts during this early stage.

Hence I think, as expressed, that the question really is whether we
can safely defer check_x2apic_preenabled() by a little bit.

> 4) Start up Xen's general IRQ infrastructure.
> 
> It's a fair chunk of work, but it will vastly simplify the boot logic
> and let us delete the infinite flushing loops out of the IOMMU logic,
> and we don't need any logic which has to second guess itself based on
> what happened earlier on boot.
> 
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_
>>  bool __read_mostly iommu_intpost;
>>  #endif
>>  
>> +void __init acpi_iommu_init(void)
>> +{
>> +    if ( iommu_enable )
>> +    {
>> +        int ret = acpi_dmar_init();
>> +
>> +        if ( ret == -ENODEV )
>> +            ret = acpi_ivrs_init();
>> +
>> +        if ( ret )
>> +            iommu_enable = false;
>> +    }
>> +
>> +    if ( !iommu_enable )
>> +        iommu_intremap = iommu_intremap_off;
>> +}
> 
> This does fix my issue, so preferably with the Fixes tag reinstated,
> 
> Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Tested-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Thanks, but I think there will need to be a v2, as per below (plus
possibly the dealing with check_x2apic_preenabled()).

> However, I don't think skipping parsing is a sensible move.  Intremap is
> utterly mandatory if during boot, we discover that our APIC ID is >254,
> and iommu=no-intremap must be ignored in this case, or if the MADT says
> we have CPUs beyond that limit and the user hasn't specified nr_cpus=1
> or equivalent.

Reading this made me realize that the change breaks other behavior.
The conditional really needs to be iommu_enable || iommu_intremap -
at least AMD code added in support for x2APIC already treats the
latter to not be a sub-option of the former (iov_supports_xt(),
acpi_ivrs_init()), and e.g. intel_iommu_supports_eim() also checks
the latter alone.

Overriding "iommu=no-intremap" in case it's unavoidable could then
be a later change, not further affecting the function here.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.