[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Shattering superpages impact on IOMMU in Xen



Hi, Stefano.

On Mon, Apr 3, 2017 at 11:33 PM, Stefano Stabellini
<sstabellini@xxxxxxxxxx> wrote:
> On Mon, 3 Apr 2017, Oleksandr Tyshchenko wrote:
>> On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
>> > Hi Andrew,
>> >
>> >
>> > On 03/04/17 18:16, Andrew Cooper wrote:
>> >>
>> >> On 03/04/17 18:02, Julien Grall wrote:
>> >>>
>> >>> Hi Andrew,
>> >>>
>> >>> On 03/04/17 17:42, Andrew Cooper wrote:
>> >>>>
>> >>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
>> >>>>>
>> >>>>> Hi, all.
>> >>>>>
>> >>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting
>> >>>>> thing. I found out that the superpages were shattered during domain
>> >>>>> life cycle.
>> >>>>> This is the result of mapping of foreign pages, ballooning memory,
>> >>>>> even if domain maps Xen shared pages, etc.
>> >>>>> I don't bother with the memory fragmentation at the moment. But,
>> >>>>> shattering bothers me from the IOMMU point of view.
>> >>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when
>> >>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect
>> >>>>> when the DMA transaction isn't in progress
>> >>>>> in order to prevent this race. So, if we have inflight transaction
>> >>>>> from a device when changing IOMMU mapping we might get into trouble.
>> >>>>> Unfortunately, not in all the cases the
>> >>>>> faulting transaction can be restarted. The chance to hit the problem
>> >>>>> increases during shattering.
>> >>>>>
>> >>>>> I did next test:
>> >>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU.
>> >>>>> What is more, as the IOMMU I am playing with supports superpages (2M,
>> >>>>> 1G) the IOMMU driver
>> >>>>> takes into account these capabilities when building page tables. As I
>> >>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
>> >>>>> only. As I am using NFS for both dom0 and domU the ethernet IP
>> >>>>> performs DMA transactions almost all the time.
>> >>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I
>> >>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it
>> >>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there
>> >>>>> for copying domU images).
>> >>>>> But, I don't see any page faults when the IOMMU page table was built
>> >>>>> by 4K pages only.
>> >>>>>
>> >>>>> I had a talk with Julien on IIRC and we came to conclusion that the
>> >>>>> safest way would be to use 4K pages to prevent shattering, so the
>> >>>>> IOMMU shouldn't report superpage capability.
>> >>>>> On the other hand, if we build IOMMU from 4K pages we will have
>> >>>>> performance drop (during building, walking page tables), TLB pressure,
>> >>>>> etc.
>> >>>>> Another possible solution Julien was suggesting is to always
>> >>>>> ballooning with 2M, 1G, and not using 4K. That would help us to
>> >>>>> prevent shattering effect.
>> >>>>> The discussion was moved to the ML since it seems to be a generic
>> >>>>> issue and the right solution should be think of.
>> >>>>>
>> >>>>> What do you think is the right way to follow? Use 4K pages and don't
>> >>>>> bother with shattering or try to optimize? And if the idea to make
>> >>>>> balloon mechanism smarter makes sense how to teach balloon to do so?
>> >>>>> Thank you.
>> >>>>
>> >>>>
>> >>>> Ballooning and foreign mappings are terrible for trying to retain
>> >>>> superpage mappings.  No OS, not even Linux, can sensibly provide victim
>> >>>> pages in a useful way to avoid shattering.
>> >>>>
>> >>>> If you care about performance, don't ever balloon.  Foreign mappings in
>> >>>> translated guests should start from the top of RAM, and work upwards.
>> >>>
>> >>>
>> >>> I am not sure to understand this. Can you extend?
>> >>
>> >>
>> >> I am not sure what is unclear.  Handing random frames of RAM back to the
>> >> hypervisor is what exacerbates host superpage fragmentation, and all
>> >> balloon drivers currently do it.
>> >>
>> >> If you want to avoid host superpage fragmentation, don't use a
>> >> scattergun approach of handing frames back to Xen.  However, because
>> >> even Linux doesn't provide enough hooks into the physical memory
>> >> management logic, the only solution is to not balloon at all, and to use
>> >> already-unoccupied frames for foreign mappings.
>> >
>> >
>> > Do you have any pointer in the Linux code?
>> >
>> >
>> >>
>> >>>
>> >>>>
>> >>>>
>> >>>> As for the IOMMU specifically, things are rather easier.  It is the
>> >>>> guests responsibility to ensure that frames offered up for ballooning or
>> >>>> foreign mappings are unused.  Therefore, if anything cares about the
>> >>>> specific 4K region becoming non-present in the IOMMU mappings, it is the
>> >>>> guest kernels fault for offering up a frame already in use.
>> >>>>
>> >>>> For the shattering however, It is Xen's responsibility to ensure that
>> >>>> all other mappings stay valid at all points.  The correct way to do this
>> >>>> is to construct a new L1 table, mirroring the L2 superpage but lacking
>> >>>> the specific 4K mapping in question, then atomically replace the L2
>> >>>> superpage entry with the new L1 table, then issue an IOMMU TLB
>> >>>> invalidation to remove any cached mappings.
>> >>>>
>> >>>> By following that procedure, all DMA within the 2M region, but not
>> >>>> hitting the 4K frame, won't observe any interim lack of mappings.  It
>> >>>> appears from your description that Xen isn't following the procedure.
>> >>>
>> >>>
>> >>> Xen is following what's the ARM ARM is mandating. For shattering page
>> >>> table, we have to follow the break-before-sequence i.e:
>> >>>     - Invalidate the L2 entry
>> >>>     - Flush the TLBs
>> >>>     - Add the new L1 table
>> >>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
>> >>> small window where there are no valid mapping. It is easy to trap data
>> >>> abort from processor and restarting it but not for device memory
>> >>> transactions.
>> >>>
>> >>> Xen by default is sharing stage-2 page tables with between the IOMMU
>> >>> and the MMU. However, from the discussion I had with Oleksandr, they
>> >>> are not sharing page tables and still see the problem. I am not sure
>> >>> how they are updating the page table here. Oleksandr, can you provide
>> >>> more details?
>> >>
>> >>
>> >> Are you saying that ARM has no way of making atomic updates to the IOMMU
>> >> mappings?  (How do I get access to that document?  Google gets me to
>> >>
>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
>> >> but
>> >> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
>> >> which looks like the document you specified results in 404.)
>> >
>> >
>> > Below a link, I am not sure why google does not refer it:
>> >
>> > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html
>> >
>> >>
>> >> If so, that is an architecture bug IMO.  By design, the IOMMU is out of
>> >> control of guest software, and the hypervisor should be able to make
>> >> atomic modifications without guest cooperation.
>> >
>> >
>> > I think you misread what I meant, IOMMU supports atomic operations. 
>> > However,
>> > if you share the page table we have to apply Break-Before-Make when
>> > shattering superpage. This is mandatory if you want to get Xen running on
>> > all the micro-architectures.
>> >
>> > Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far
>> > (it does not mean there are none).
>> >
>> > The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I
>> > never used myself. In his case he needs different page tables because the
>> > layouts are not the same.
>> >
>> > Oleksandr, looking at the code your provided, the superpage are split the
>> > way Andrew said, i.e:
>> >         1) allocating level 3 table minus the 4K mapping
>> >         2) replace level 2 entry with the new table
>> >
>> > Am I right?
>>
>> It seems, yes. Walking the page table down when trying to unmap we
>> bump into leaf entry (2M mapping),
>> so 2M-4K mapping are inserted at the next level and after that the
>> page table entry are replaced.
>
> Let me premise that Andrew well pointed out what should be the right
> approach on dealing with this issue. However, if we have to use
> break-before-make for IOMMU pagetables, then it means we cannot do
> atomic updates to IOMMU mappings, like Andrew wrote. Therefore, we
> have to make a choice: we either disable superpage IOMMU mappings or
> ballooning. I would disable IOMMU superpage mappings, on the ground that
> supporting superpage mappings without supporting atomic shattering or
> restartable transactions is not really supporting superpage mappings.

Sounds reasonable. As Julien mentioned too "using 4K pages only" is
the safest way.
At least until I will find a reason why DMA faults take place despite
the fast that shattering is
doing in an atomic way.

>
> However, you are not doing break-before-make here. I would investigate
> if break-before-make is required by VMSA-IPMMU. If it is not required,
> why are you seeing DMA faults?

Unfortunally, I can't say about break-before-make sequence for IPMMU
at the moment.
TRM says nothing about it.

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.