[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 06/18] IOMMU/x86: restrict IO-APIC mappings for PV Dom0


  • To: Andrew Cooper <amc96@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 3 Dec 2021 07:41:40 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YEC8C36ucthq1KVn9e+D1mZNV5qqS3WzoDe4M8yJiC4=; b=MoPaawlA5ne696cd1eMVvMDaRGWiyL35jhcVNPeDv0fZ+IO8ONf+2PNghcrBilxyMda5umtGTucPLWg5whTarabv1/+2tCGJBbtKdew5d6lUwajDJot6G8AU6AATMyz+oiQqHdfLyHX3FzBMSeCFiEzcHnbee4JyHXcTWs6VQ5ntVq8T1J8zjGhFT55eMh9kLNVUFs4JnoC7OETgz24RxXdUMZXitzbhYicJWiNkzQIyHNJ+WMpzS+yxJXjpzzNdvGl7D9sS4UnkSdczR16wjihP0CjD4913fWUS6Syw92ROQuTU0oI0bIKgubBy0eH65/pwdgW/tvdNYC2J+tjwWA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=T1mQHwNNG/SS6ugzI5Xda1cx+7tfMwQltxnMCN8QHRi/TvrA/p8hl2Mu166BY/rnMoKcvpVowSxOc6wYRoTNRmvNL62Fep5sp6zE8FXXJ1rlPsLYGm5HqfemnDxYQneNOFoW1zCv/vRryjbKLeyI4DvIsUlsOtcIkrN+qmCQmtlYzB/UNnYP62HhaM5sBNOblzDELAKsbhkkD1BjkqxgBeC9PseeJtT4XTghx165e6TrhUnd2Yv02Y+grWjhhCkgBMJHhESp19WF5qPnu6Rencc/HG9nP2O08U/B+Wbzq8nZGPA4+p5e4U3fbNz93ROSNwdIA9Z6WLfFFbPeCNOSHw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 03 Dec 2021 06:42:06 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.12.2021 20:16, Andrew Cooper wrote:
> On 02/12/2021 15:28, Jan Beulich wrote:
>> On 02.12.2021 16:12, Roger Pau Monné wrote:
>>> On Wed, Dec 01, 2021 at 12:45:12PM +0100, Jan Beulich wrote:
>>>> On 01.12.2021 11:32, Roger Pau Monné wrote:
>>>>> On Wed, Dec 01, 2021 at 10:27:21AM +0100, Jan Beulich wrote:
>>>>>> On 01.12.2021 10:09, Roger Pau Monné wrote:
>>>>>>> On Fri, Sep 24, 2021 at 11:46:57AM +0200, Jan Beulich wrote:
>>>>>>>> @@ -267,44 +267,60 @@ static bool __hwdom_init hwdom_iommu_map
>>>>>>>>       * that fall in unusable ranges for PV Dom0.
>>>>>>>>       */
>>>>>>>>      if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
>>>>>>>> -        return false;
>>>>>>>> +        return 0;
>>>>>>>>  
>>>>>>>>      switch ( type = page_get_ram_type(mfn) )
>>>>>>>>      {
>>>>>>>>      case RAM_TYPE_UNUSABLE:
>>>>>>>> -        return false;
>>>>>>>> +        return 0;
>>>>>>>>  
>>>>>>>>      case RAM_TYPE_CONVENTIONAL:
>>>>>>>>          if ( iommu_hwdom_strict )
>>>>>>>> -            return false;
>>>>>>>> +            return 0;
>>>>>>>>          break;
>>>>>>>>  
>>>>>>>>      default:
>>>>>>>>          if ( type & RAM_TYPE_RESERVED )
>>>>>>>>          {
>>>>>>>>              if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved )
>>>>>>>> -                return false;
>>>>>>>> +                perms = 0;
>>>>>>>>          }
>>>>>>>> -        else if ( is_hvm_domain(d) || !iommu_hwdom_inclusive || pfn > 
>>>>>>>> max_pfn )
>>>>>>>> -            return false;
>>>>>>>> +        else if ( is_hvm_domain(d) )
>>>>>>>> +            return 0;
>>>>>>>> +        else if ( !iommu_hwdom_inclusive || pfn > max_pfn )
>>>>>>>> +            perms = 0;
>>>>>>> I'm confused about the reason to set perms = 0 instead of just
>>>>>>> returning here. AFAICT perms won't be set to any other value below,
>>>>>>> so you might as well just return 0.
>>>>>> This is so that ...
>>>>>>
>>>>>>>>      }
>>>>>>>>  
>>>>>>>>      /* Check that it doesn't overlap with the Interrupt Address 
>>>>>>>> Range. */
>>>>>>>>      if ( pfn >= 0xfee00 && pfn <= 0xfeeff )
>>>>>>>> -        return false;
>>>>>>>> +        return 0;
>>>>>>>>      /* ... or the IO-APIC */
>>>>>>>> -    for ( i = 0; has_vioapic(d) && i < d->arch.hvm.nr_vioapics; i++ )
>>>>>>>> -        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
>>>>>>>> -            return false;
>>>>>>>> +    if ( has_vioapic(d) )
>>>>>>>> +    {
>>>>>>>> +        for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
>>>>>>>> +            if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
>>>>>>>> +                return 0;
>>>>>>>> +    }
>>>>>>>> +    else if ( is_pv_domain(d) )
>>>>>>>> +    {
>>>>>>>> +        /*
>>>>>>>> +         * Be consistent with CPU mappings: Dom0 is permitted to 
>>>>>>>> establish r/o
>>>>>>>> +         * ones there, so it should also have such established for 
>>>>>>>> IOMMUs.
>>>>>>>> +         */
>>>>>>>> +        for ( i = 0; i < nr_ioapics; i++ )
>>>>>>>> +            if ( pfn == PFN_DOWN(mp_ioapics[i].mpc_apicaddr) )
>>>>>>>> +                return rangeset_contains_singleton(mmio_ro_ranges, 
>>>>>>>> pfn)
>>>>>>>> +                       ? IOMMUF_readable : 0;
>>>>>>>> +    }
>>>>>> ... this return, as per the comment, takes precedence over returning
>>>>>> zero.
>>>>> I see. This is because you want to map those in the IOMMU page tables
>>>>> even if the IO-APIC ranges are outside of a reserved region.
>>>>>
>>>>> I have to admit this is kind of weird, because the purpose of this
>>>>> function is to add mappings for all memory below 4G, and/or for all
>>>>> reserved regions.
>>>> Well, that was what it started out as. The purpose here is to be consistent
>>>> about IO-APICs: Either have them all mapped, or none of them. Since we map
>>>> them in the CPU page tables and since Andrew asked for the two mappings to
>>>> be consistent, this is the only way to satisfy the requests. Personally I'd
>>>> be okay with not mapping IO-APICs here (but then regardless of whether they
>>>> are covered by a reserved region).
>>> I'm unsure of the best way to deal with this, it seems like both
>>> the CPU and the IOMMU page tables would never be equal for PV dom0,
>>> because we have no intention to map the MSI-X tables in RO mode in the
>>> IOMMU page tables.
>>>
>>> I'm not really opposed to having the IO-APIC mapped RO in the IOMMU
>>> page tables, but I also don't see much benefit of doing it unless we
>>> have a user-case for it. The IO-APIC handling in PV is already
>>> different from native, so I would be fine if we add a comment noting
>>> that while the IO-APIC is mappable to the CPU page tables as RO it's
>>> not present in the IOMMU page tables (and then adjust hwdom_iommu_map
>>> to prevent it's mapping).
>> Andrew, you did request both mappings to get in sync - thoughts?
> 
> Lets step back to first principles.
> 
> On real hardware, there is no such thing as read-only-ness of the
> physical address space.  Anything like that is a device which accepts
> and discards writes.
> 
> It's not clear what a real hardware platform would do in this scenario,
> but from reading some of the platform docs, I suspect the System Address
> Decoder would provide a symmetric view of the hardware address space,
> but this doesn't mean that UBOX would tolerate memory accesses uniformly
> from all sources.  Also, there's nothing to say that all platforms
> behave the same.
> 
> 
> For HVM with shared-pt, the CPU and IOMMU mappings really are
> identical.  The IOMMU really will get a read-only mapping of real MMCFG,
> and holes for fully-emulated devices, which would suffer a IOMMU fault
> if targetted.
> 
> For HVM without shared-pt, the translations are mostly kept in sync, but
> the permissions in the CPU mappings may be reduced for e.g. logdirty
> reasons.
> 
> For PV guests, things are mostly like the HVM shared-pt case, except
> we've got the real IO-APICs mapped read-only, and no fully-emulated devices.
> 
> 
> Putting the real IO-APICs in the IOMMU is about as short sighted as
> letting the PV guest see them to begin with, but there is nothing
> fundamentally wrong with letting a PV guest do a DMA read of the
> IO-APIC, seeing as we let it do a CPU read.  (And whether the platform
> will even allow it, is a different matter.)
> 
> 
> However, it is really important for there to not be a load of special
> casing (all undocumented, naturally) keeping the CPU and IOMMU views
> different.  It is an error that the views were ever different
> (translation wise), and the only legitimate permission difference I can
> think of is to support logdirty mode for migration.  (Introspection
> protection for device-enabled VMs will be left as an exercise to
> whomever first wants to use it.)
> 
> Making the guest physical address space view consistent between the CPU
> and device is a "because its obviously the correct thing to do" issue. 
> Deciding "well it makes no sense for you to have an IO mapping of $FOO"
> is a matter of policy that Xen has no legitimate right to be enforcing.

To summarize: You continue to think it's better to map the IO-APICs r/o
also in the IOMMU, despite there not being any practical need for these
mappings (the CPU ones get permitted as a workaround only, after all).
Please correct me if that's a wrong understanding of your reply. And I
take it that you're aware that CPU mappings get inserted only upon Dom0's
request, whereas IOMMU mappings get created once during boot (the
inconsistent form of which had been present prior to this patch).

Any decision here would then imo also want to apply to e.g. the HPET
region, which we have a mode for where Dom0 can map it r/o. And the
MSI-X tables and PBAs (which get dynamically entered into mmio_ro_ranges).

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.