[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Shattering superpages impact on IOMMU in Xen



On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi Andrew,
>
>
> On 03/04/17 18:16, Andrew Cooper wrote:
>>
>> On 03/04/17 18:02, Julien Grall wrote:
>>>
>>> Hi Andrew,
>>>
>>> On 03/04/17 17:42, Andrew Cooper wrote:
>>>>
>>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>>>>>
>>>>> Hi, all.
>>>>>
>>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>>>>> thing. I found out that the superpages were shattered during domain
>>>>> life cycle.
>>>>> This is the result of mapping of foreign pages, ballooning memory,
>>>>> even if domain maps Xen shared pages, etc.
>>>>> I don't bother with the memory fragmentation at the moment. But,
>>>>> shattering bothers me from the IOMMU point of view.
>>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>>>>> when the DMA transaction isn't in progress
>>>>> in order to prevent this race. So, if we have inflight transaction
>>>>> from a device when changing IOMMU mapping we might get into trouble.
>>>>> Unfortunately, not in all the cases the
>>>>> faulting transaction can be restarted. The chance to hit the problem
>>>>> increases during shattering.
>>>>>
>>>>> I did next test:
>>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>>>>> 1G) the IOMMU driver
>>>>> takes into account these capabilities when building page tables. As I
>>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>>>>> performs DMA transactions almost all the time.
>>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>>>>> for copying domU images).
>>>>> But, I don't see any page faults when the IOMMU page table was built
>>>>> by 4K pages only.
>>>>>
>>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>>>>> safest way would be to use 4K pages to prevent shattering, so the
>>>>> IOMMU shouldn't report superpage capability.
>>>>> On the other hand, if we build IOMMU from 4K pages we will have
>>>>> performance drop (during building, walking page tables), TLB pressure,
>>>>> etc.
>>>>> Another possible solution Julien was suggesting is to always
>>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>>>>> prevent shattering effect.
>>>>> The discussion was moved to the ML since it seems to be a generic
>>>>> issue and the right solution should be think of.
>>>>>
>>>>> What do you think is the right way to follow? Use 4K pages and don't
>>>>> bother with shattering or try to optimize? And if the idea to make
>>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>>>>> Thank you.
>>>>
>>>>
>>>> Ballooning and foreign mappings are terrible for trying to retain
>>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>>>> pages in a useful way to avoid shattering.
>>>>
>>>> If you care about performance, don't ever balloon.  Foreign mappings in
>>>> translated guests should start from the top of RAM, and work upwards.
>>>
>>>
>>> I am not sure to understand this. Can you extend?
>>
>>
>> I am not sure what is unclear.  Handing random frames of RAM back to the
>> hypervisor is what exacerbates host superpage fragmentation, and all
>> balloon drivers currently do it.
>>
>> If you want to avoid host superpage fragmentation, don't use a
>> scattergun approach of handing frames back to Xen.  However, because
>> even Linux doesn't provide enough hooks into the physical memory
>> management logic, the only solution is to not balloon at all, and to use
>> already-unoccupied frames for foreign mappings.
>
>
> Do you have any pointer in the Linux code?
>
>
>>
>>>
>>>>
>>>>
>>>> As for the IOMMU specifically, things are rather easier.  It is the
>>>> guests responsibility to ensure that frames offered up for ballooning or
>>>> foreign mappings are unused.  Therefore, if anything cares about the
>>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>>>> guest kernels fault for offering up a frame already in use.
>>>>
>>>> For the shattering however, It is Xen's responsibility to ensure that
>>>> all other mappings stay valid at all points.  The correct way to do this
>>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>>>> the specific 4K mapping in question, then atomically replace the L2
>>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>>>> invalidation to remove any cached mappings.
>>>>
>>>> By following that procedure, all DMA within the 2M region, but not
>>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>>>> appears from your description that Xen isn't following the procedure.
>>>
>>>
>>> Xen is following what's the ARM ARM is mandating. For shattering page
>>> table, we have to follow the break-before-sequence i.e:
>>>     - Invalidate the L2 entry
>>>     - Flush the TLBs
>>>     - Add the new L1 table
>>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
>>> small window where there are no valid mapping. It is easy to trap data
>>> abort from processor and restarting it but not for device memory
>>> transactions.
>>>
>>> Xen by default is sharing stage-2 page tables with between the IOMMU
>>> and the MMU. However, from the discussion I had with Oleksandr, they
>>> are not sharing page tables and still see the problem. I am not sure
>>> how they are updating the page table here. Oleksandr, can you provide
>>> more details?
>>
>>
>> Are you saying that ARM has no way of making atomic updates to the IOMMU
>> mappings?  (How do I get access to that document?  Google gets me to
>>
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
>> but
>> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
>> which looks like the document you specified results in 404.)
>
>
> Below a link, I am not sure why google does not refer it:
>
> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html
>
>>
>> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
>> control of guest software, and the hypervisor should be able to make
>> atomic modifications without guest cooperation.
>
>
> I think you misread what I meant, IOMMU supports atomic operations. However,
> if you share the page table we have to apply Break-Before-Make when
> shattering superpage. This is mandatory if you want to get Xen running on
> all the micro-architectures.
>
> Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far
> (it does not mean there are none).
>
> The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I
> never used myself. In his case he needs different page tables because the
> layouts are not the same.
>
> Oleksandr, looking at the code your provided, the superpage are split the
> way Andrew said, i.e:
>         1) allocating level 3 table minus the 4K mapping
>         2) replace level 2 entry with the new table
>
> Am I right?

It seems, yes. Walking the page table down when trying to unmap we
bump into leaf entry (2M mapping),
so 2M-4K mapping are inserted at the next level and after that the
page table entry are replaced.

>
> Cheers,
>
> --
> Julien Grall



-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.