[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 10/10] AMD/IOMMU: restrict interrupt remapping table sizes



On 18.09.2019 12:45, Andrew Cooper wrote:
> On 18/09/2019 09:11, Jan Beulich wrote:
>> On 17.09.2019 17:30, Andrew Cooper wrote:
>>> On 06/08/2019 14:11, Jan Beulich wrote:
>>>> There's no point setting up tables with more space than a PCI device can
>>>> use. For both MSI and MSI-X we can determine how many interrupts could
>>>> be set up at most. Tables allocated during ACPI table parsing, however,
>>>> will (for now at least) continue to be set up to have maximum size.
>>>>
>>>> Note that until we would want to use sub-page allocations here there's
>>>> no point checking whether MSI is supported by a device - 1 or up to 32
>>>> (or actually 128, due to the change effectively using a reserved
>>>> encoding) IRTEs always mean an order-0 allocation anyway.
>>> Devices which are not MSI-capable don't need an interrupt remapping
>>> table at all.
>> Oh, good point - pin based interrupts use the respective IO-APIC's
>> IRTE.
> 
> A lot of these devices have no interrupt capabilities at all.

Right, but this would be harder to tell than "is neither MSI-X nor
MSI capable".

>>>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>>>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>>>> @@ -1315,11 +1317,8 @@ static int __init amd_iommu_setup_device
>>>>              }
>>>>  
>>>>              amd_iommu_set_intremap_table(
>>>> -                dte,
>>>> -                ivrs_mappings[bdf].intremap_table
>>>> -                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
>>>> -                : 0,
>>>> -                iommu_intremap);
>>>> +                dte, ivrs_mappings[bdf].intremap_table,
>>>> +                ivrs_mappings[bdf].iommu, iommu_intremap);
>>> Ah - half of this looks like it wants to be in patch 6, rather than here.
>> Hmm, which half?
> 
> The dropping of the ternary expression.
> 
>> I don't see anything misplaced here. The signature
>> of amd_iommu_set_intremap_table) changes only in this patch, not in
>> patch 6.
> 
> If the code isn't misplaced, I can't spot why it is necessary before
> this patch.

As explained in the context of the patch introducing it - there
is a way for ivrs_mappings[bdf].intremap_table to be NULL here.
The handling of this case merely gets moved into
amd_iommu_set_intremap_table() by the patch here.

>>>> @@ -80,17 +81,13 @@ unsigned int nr_ioapic_sbdf;
>>>>  
>>>>  static void dump_intremap_tables(unsigned char key);
>>>>  
>>>> -static unsigned int __init intremap_table_order(const struct
>>>> amd_iommu *iommu)
>>>> -{
>>>> -    return iommu->ctrl.ga_en
>>>> -           ? get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union
>>>> irte128))
>>>> -           : get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union
>>>> irte32));
>>>> -}
>>>> +#define intremap_page_order(irt) PFN_ORDER(virt_to_page(irt))
>>> What makes the frameable order field safe to use?  It reaches into
>>> (pg)->v.free.order which fairly obviously isn't safe for allocated pages.
>> The same argument that allows xmalloc_whole_pages() and xfree() to
>> use this field: It is the owner of a page who controls how the
>> individual sub-fields of a union get used. As long as v.inuse and
>> v.sh don't get used, (ab)using v.free for an allocated page is
>> quite fine.
> 
> In which case I think v.free needs renaming and/or the comment for
> PFN_ORDER() needs rewriting.
> 
> The current code/documentation does not give the impression that the
> current uses of PFN_ORDER() are safe.
> 
> Judging by the other users (particularly the IOMMU code), it would be
> best in a field called opaque or similar.

Well, perhaps. But I don't fancy doing this in this series.

>>> virt_to_page() is a non-trivial calculation, which is now used in a
>>> large number of circumstances.  I don't have an easy judgement of
>>> whether they are hotpaths, but surely it would be easier to just store
>>> another unsigned int per device.
>> Except this would be a vendor specific field in a supposedly
>> vendor agnostic structure. I'm not very keen to add such a field.
>> Also I don't think interrupt setup/teardown paths would normally
>> be considered "hot".
>>
>> What I could be talked into (to limit code size in case the
>> compiler doesn't expand things inline overly aggressively) is to
>> make this a static function instead of a macro.
> 
> I considered asking for that, but I expected not to get very far, given
> that the use of PFN_ORDER() as an lvalue seems to be the prevailing idiom.

Oh, of course - that's why it's a macro in the first place. It's
been a long while since I put together this change ...

>>> Furthermore, it would work around a preexisting issue where we can
>>> allocate beyond the number of interrupts for the device, up to the next
>>> order boundary.
>> Is this really an "issue", rather than just an irrelevant side
>> effect (which is never going to be hit as long as other layers
>> work correctly, in that they bound requests appropriately)?
> 
> Its not major, certainly (and I wouldn't object to the patch for this
> issue in isolation), but defensive coding is something to consider when
> the underlying algorithm is up for debate.

Understood. But since you didn't comment on why I wouldn't like
to introduce a per-device field as you suggest, it's not clear
whether you accept that argument. But wait - I effectively have
such a field now (after introducing msi_maxvec, i.e. it's sort
of a split field). The real problem is that the device isn't
always available when we want to get at this value (and there
may not even be one, like for the shared table, or for IO-APIC
and HPET entries). Having checked again, what might work out is
to add a field to struct ivrs_mappings.

But if I went there, then I could as well do away with full
page allocations, yet I didn't want to go quite as far for now.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.