[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 06/18] IOMMU/x86: restrict IO-APIC mappings for PV Dom0


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 2 Dec 2021 16:28:49 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tb0t2KfkfAMzetSGxn1iGQUoLKqADfeLjajK4h8gn64=; b=c7ifWTaTC3PFqtc3Cm7JOowAIa9o2daIP5h+e1TKMaubJGFyFBX7stlnwU4gotiAdOFoimcnWJTgE9fZAmXEvcpX3C6GgbZWLwzrmnpJH24tsmQMP5ig4OfYR01cttk955txsgSCSQDDPyCohZl7optrNOj+BKq/hgdUqX2dmR4z+yj/Mm+hxxfs3hQKpJUNO7JjNFaZ5f/wc015lTM5I+pRM6Ig4jaTTR9+zqkjZqeLXT1gGefa5XMRyitOqD101oDgj7a5ficVKmCukWIcDvWd/iSCwAXZ6BIwyLCrveIVEbzwCb1/RGpRnwFQUbBJ1fmCL6XcKk1B1nKe8EIxHw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Afhz33SX5zvT4d6Uwi35vkZ2tc8PcdSzwYBgeMgkKFfnEk5NtTZCdRaoUKmYnFo4IVz6w240MTL4pdcU0K9/X5IQrLZYjpOI6BzXW/GEzC7oxwF0q+7SREBJG/c4VtPA/OkoNwSve44/LQX4UwH5YRy1auktl4akd4URSbeuPrspFnrjxAYugMq6aNVC1ksvB3CEcmOSkfK3oAN8uSJJYUJOTaeD8hb+rWENb89zHkiznEG4cY0SFRgSdGFfocpPqA2xu4QoC/pgB5oTsUSxRqECkFteisEZyMWWys8a+9uSo4IgRxwA3IW5JMqn9t2MqLeaF97myN4eKBKvHlN8xg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Thu, 02 Dec 2021 15:29:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.12.2021 16:12, Roger Pau Monné wrote:
> On Wed, Dec 01, 2021 at 12:45:12PM +0100, Jan Beulich wrote:
>> On 01.12.2021 11:32, Roger Pau Monné wrote:
>>> On Wed, Dec 01, 2021 at 10:27:21AM +0100, Jan Beulich wrote:
>>>> On 01.12.2021 10:09, Roger Pau Monné wrote:
>>>>> On Fri, Sep 24, 2021 at 11:46:57AM +0200, Jan Beulich wrote:
>>>>>> @@ -267,44 +267,60 @@ static bool __hwdom_init hwdom_iommu_map
>>>>>>       * that fall in unusable ranges for PV Dom0.
>>>>>>       */
>>>>>>      if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
>>>>>> -        return false;
>>>>>> +        return 0;
>>>>>>  
>>>>>>      switch ( type = page_get_ram_type(mfn) )
>>>>>>      {
>>>>>>      case RAM_TYPE_UNUSABLE:
>>>>>> -        return false;
>>>>>> +        return 0;
>>>>>>  
>>>>>>      case RAM_TYPE_CONVENTIONAL:
>>>>>>          if ( iommu_hwdom_strict )
>>>>>> -            return false;
>>>>>> +            return 0;
>>>>>>          break;
>>>>>>  
>>>>>>      default:
>>>>>>          if ( type & RAM_TYPE_RESERVED )
>>>>>>          {
>>>>>>              if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved )
>>>>>> -                return false;
>>>>>> +                perms = 0;
>>>>>>          }
>>>>>> -        else if ( is_hvm_domain(d) || !iommu_hwdom_inclusive || pfn > 
>>>>>> max_pfn )
>>>>>> -            return false;
>>>>>> +        else if ( is_hvm_domain(d) )
>>>>>> +            return 0;
>>>>>> +        else if ( !iommu_hwdom_inclusive || pfn > max_pfn )
>>>>>> +            perms = 0;
>>>>>
>>>>> I'm confused about the reason to set perms = 0 instead of just
>>>>> returning here. AFAICT perms won't be set to any other value below,
>>>>> so you might as well just return 0.
>>>>
>>>> This is so that ...
>>>>
>>>>>>      }
>>>>>>  
>>>>>>      /* Check that it doesn't overlap with the Interrupt Address Range. 
>>>>>> */
>>>>>>      if ( pfn >= 0xfee00 && pfn <= 0xfeeff )
>>>>>> -        return false;
>>>>>> +        return 0;
>>>>>>      /* ... or the IO-APIC */
>>>>>> -    for ( i = 0; has_vioapic(d) && i < d->arch.hvm.nr_vioapics; i++ )
>>>>>> -        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
>>>>>> -            return false;
>>>>>> +    if ( has_vioapic(d) )
>>>>>> +    {
>>>>>> +        for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
>>>>>> +            if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
>>>>>> +                return 0;
>>>>>> +    }
>>>>>> +    else if ( is_pv_domain(d) )
>>>>>> +    {
>>>>>> +        /*
>>>>>> +         * Be consistent with CPU mappings: Dom0 is permitted to 
>>>>>> establish r/o
>>>>>> +         * ones there, so it should also have such established for 
>>>>>> IOMMUs.
>>>>>> +         */
>>>>>> +        for ( i = 0; i < nr_ioapics; i++ )
>>>>>> +            if ( pfn == PFN_DOWN(mp_ioapics[i].mpc_apicaddr) )
>>>>>> +                return rangeset_contains_singleton(mmio_ro_ranges, pfn)
>>>>>> +                       ? IOMMUF_readable : 0;
>>>>>> +    }
>>>>
>>>> ... this return, as per the comment, takes precedence over returning
>>>> zero.
>>>
>>> I see. This is because you want to map those in the IOMMU page tables
>>> even if the IO-APIC ranges are outside of a reserved region.
>>>
>>> I have to admit this is kind of weird, because the purpose of this
>>> function is to add mappings for all memory below 4G, and/or for all
>>> reserved regions.
>>
>> Well, that was what it started out as. The purpose here is to be consistent
>> about IO-APICs: Either have them all mapped, or none of them. Since we map
>> them in the CPU page tables and since Andrew asked for the two mappings to
>> be consistent, this is the only way to satisfy the requests. Personally I'd
>> be okay with not mapping IO-APICs here (but then regardless of whether they
>> are covered by a reserved region).
> 
> I'm unsure of the best way to deal with this, it seems like both
> the CPU and the IOMMU page tables would never be equal for PV dom0,
> because we have no intention to map the MSI-X tables in RO mode in the
> IOMMU page tables.
> 
> I'm not really opposed to having the IO-APIC mapped RO in the IOMMU
> page tables, but I also don't see much benefit of doing it unless we
> have a user-case for it. The IO-APIC handling in PV is already
> different from native, so I would be fine if we add a comment noting
> that while the IO-APIC is mappable to the CPU page tables as RO it's
> not present in the IOMMU page tables (and then adjust hwdom_iommu_map
> to prevent it's mapping).

Andrew, you did request both mappings to get in sync - thoughts?

> I think we should also prevent mapping of the LAPIC, the HPET and the
> HyperTransport range in case they fall into a reserved region?

Probably.

> TBH looks like we should be using iomem_access_permitted in
> arch_iommu_hwdom_init? (not just for the IO-APIC, but for other device
> ranges)

In general - perhaps. Not sure though whether to switch to doing so
right here.

>>>>>> @@ -346,15 +362,19 @@ void __hwdom_init arch_iommu_hwdom_init(
>>>>>>      for ( ; i < top; i++ )
>>>>>>      {
>>>>>>          unsigned long pfn = pdx_to_pfn(i);
>>>>>> +        unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);
>>>>>>          int rc;
>>>>>>  
>>>>>> -        if ( !hwdom_iommu_map(d, pfn, max_pfn) )
>>>>>> +        if ( !perms )
>>>>>>              rc = 0;
>>>>>>          else if ( paging_mode_translate(d) )
>>>>>> -            rc = set_identity_p2m_entry(d, pfn, p2m_access_rw, 0);
>>>>>> +            rc = set_identity_p2m_entry(d, pfn,
>>>>>> +                                        perms & IOMMUF_writable ? 
>>>>>> p2m_access_rw
>>>>>> +                                                                : 
>>>>>> p2m_access_r,
>>>>>> +                                        0);
>>>>>>          else
>>>>>>              rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << 
>>>>>> PAGE_ORDER_4K,
>>>>>> -                           IOMMUF_readable | IOMMUF_writable, 
>>>>>> &flush_flags);
>>>>>> +                           perms, &flush_flags);
>>>>>
>>>>> You could just call set_identity_p2m_entry uniformly here. It will
>>>>> DTRT for non-translated guests also, and then hwdom_iommu_map could
>>>>> perhaps return a p2m_access_t?
>>>>
>>>> That's an orthogonal change imo, i.e. could be done as a prereq change,
>>>> but I'd prefer to leave it as is for now. Furthermore see "x86/mm: split
>>>> set_identity_p2m_entry() into PV and HVM parts": In v2 I'm now also
>>>> adjusting the code here 
>>>
>>> I would rather adjust the code here to just call
>>> set_identity_p2m_entry instead of differentiating between PV and
>>> HVM.
>>
>> I'm a little hesitant, in particular considering your suggestion to
>> then have hwdom_iommu_map() return p2m_access_t. Andrew has made quite
>> clear that the use of p2m_access_* here and in a number of other places
>> is actually an abuse.
>>
>> Plus - forgot about this in my earlier reply - there would also be a
>> conflict with the next patch in this series, where larger orders will
>> get passed to iommu_map(), while set_identity_p2m_entry() has no
>> respective parameter yet (and imo it isn't urgent for it to gain one).
> 
> I've seen now as the iommu_map path is modified to handle ranges
> instead of single pages. Long term we also want to expand this range
> handling to the HVM branch.

Long (or maybe better mid) term, yes.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.