[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating



On 07.11.2019 13:49, Andrew Cooper wrote:
> On 07/11/2019 07:36, Jan Beulich wrote:
>> On 06.11.2019 18:31, Andrew Cooper wrote:
>>> On 06/11/2019 15:16, Jan Beulich wrote:
>>>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>>>> the PCI devices lock held. The check occurring only when the mode
>>>> actually needs updating, the violation of this rule by the majority
>>>> of callers did go unnoticed until per-domain IOMMU setup was changed
>>>> to do away with on-demand creation of IOMMU page tables.
>>>>
>>>> Unfortunately the only half way reasonable fix to this that I could
>>>> come up with requires more re-work than would seem desirable at this
>>>> time of the release process, but addressing the issue seems
>>>> unavoidable to me as its manifestation is a regression from the
>>>> IOMMU page table setup re-work. The change also isn't without risk
>>>> of further regressions - if in patch 2 I've missed a code path that
>>>> would also need to invoke the new hook, then this might mean non-
>>>> working guests (with passed-through devices on AMD hardware).
>>>>
>>>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
>>>> 2: introduce GFN notification for translated domains
>>>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
>>> Having now looked at all three, why don't we just delete the dynamic
>>> height of AMD IOMMU pagetables?
>>>
>>> This series looks suspiciously like it is adding new common
>>> infrastructure to work around the fact we're doing something fairly dumb
>>> to being with.
>>>
>>> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain,
>>> and a substantial reduction in the complexity of the IOMMU code.
>> Yet an additional level of page walks hardware has to perform. Also
>> 4 levels won't cover all possible 52 address bits. And finally, the
>> more applicable your "substantial reduction", the less suitable such
>> a change may be at this point of the release (but I didn't look at
>> this side of things in any detail, so it may well not be an issue).
> 
> There is, in practice, no such thing as an HVM guest using 2 levels. 
> The VRAM just below the 4G boundary will force a resize to 3 levels
> during domain construction, and as a 1-line fix for 4.13, this probably
> isn't the worst idea going.

So here (with the 1-line fix remark) you talk about 3 levels. Yet
switching the 2 that we start from to 3 won't fix anything, as we
still may need to go to 4 for huge guests. Such a change would
merely eliminate the indeed pretty pointless move from 2 to 3 which
now happens for all domains as their memory gets populated.

> There are no AMD systems which support >48 bit PA space, so 4 levels is
> sufficient for now, but fundamentally details such as the size of GPA
> space should be specified in domain_create() and remain static for the
> lifetime of the domain.

I agree GPA dimensions ought to be static. But the number-of-levels
adjustment the code does has nothing to do with variable GPA
boundaries. Even for a domain with a, say, 36-bit GFN space it
may be beneficial to run with only 3 levels, as long as it has
"little" enough memory assigned. In fact the number of levels the
hardware has to walk is the one aspect you don't even touch in your
reply.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.