[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 00/16] dma-mapping: migrate to physical address-based API


  • To: Jason Gunthorpe <jgg@xxxxxxxxxx>, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
  • From: Demi Marie Obenour <demiobenour@xxxxxxxxx>
  • Date: Sat, 9 Aug 2025 12:53:09 -0400
  • Autocrypt: addr=demiobenour@xxxxxxxxx; keydata= xsFNBFp+A0oBEADffj6anl9/BHhUSxGTICeVl2tob7hPDdhHNgPR4C8xlYt5q49yB+l2nipd aq+4Gk6FZfqC825TKl7eRpUjMriwle4r3R0ydSIGcy4M6eb0IcxmuPYfbWpr/si88QKgyGSV Z7GeNW1UnzTdhYHuFlk8dBSmB1fzhEYEk0RcJqg4AKoq6/3/UorR+FaSuVwT7rqzGrTlscnT DlPWgRzrQ3jssesI7sZLm82E3pJSgaUoCdCOlL7MMPCJwI8JpPlBedRpe9tfVyfu3euTPLPx wcV3L/cfWPGSL4PofBtB8NUU6QwYiQ9Hzx4xOyn67zW73/G0Q2vPPRst8LBDqlxLjbtx/WLR 6h3nBc3eyuZ+q62HS1pJ5EvUT1vjyJ1ySrqtUXWQ4XlZyoEFUfpJxJoN0A9HCxmHGVckzTRl 5FMWo8TCniHynNXsBtDQbabt7aNEOaAJdE7to0AH3T/Bvwzcp0ZJtBk0EM6YeMLtotUut7h2 Bkg1b//r6bTBswMBXVJ5H44Qf0+eKeUg7whSC9qpYOzzrm7+0r9F5u3qF8ZTx55TJc2g656C 9a1P1MYVysLvkLvS4H+crmxA/i08Tc1h+x9RRvqba4lSzZ6/Tmt60DPM5Sc4R0nSm9BBff0N m0bSNRS8InXdO1Aq3362QKX2NOwcL5YaStwODNyZUqF7izjK4QARAQABzTxEZW1pIE1hcmll IE9iZW5vdXIgKGxvdmVyIG9mIGNvZGluZykgPGRlbWlvYmVub3VyQGdtYWlsLmNvbT7CwXgE EwECACIFAlp+A0oCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJELKItV//nCLBhr8Q AK/xrb4wyi71xII2hkFBpT59ObLN+32FQT7R3lbZRjVFjc6yMUjOb1H/hJVxx+yo5gsSj5LS 9AwggioUSrcUKldfA/PKKai2mzTlUDxTcF3vKx6iMXKA6AqwAw4B57ZEJoMM6egm57TV19kz PMc879NV2nc6+elaKl+/kbVeD3qvBuEwsTe2Do3HAAdrfUG/j9erwIk6gha/Hp9yZlCnPTX+ VK+xifQqt8RtMqS5R/S8z0msJMI/ajNU03kFjOpqrYziv6OZLJ5cuKb3bZU5aoaRQRDzkFIR 6aqtFLTohTo20QywXwRa39uFaOT/0YMpNyel0kdOszFOykTEGI2u+kja35g9TkH90kkBTG+a EWttIht0Hy6YFmwjcAxisSakBuHnHuMSOiyRQLu43ej2+mDWgItLZ48Mu0C3IG1seeQDjEYP tqvyZ6bGkf2Vj+L6wLoLLIhRZxQOedqArIk/Sb2SzQYuxN44IDRt+3ZcDqsPppoKcxSyd1Ny 2tpvjYJXlfKmOYLhTWs8nwlAlSHX/c/jz/ywwf7eSvGknToo1Y0VpRtoxMaKW1nvH0OeCSVJ itfRP7YbiRVc2aNqWPCSgtqHAuVraBRbAFLKh9d2rKFB3BmynTUpc1BQLJP8+D5oNyb8Ts4x Xd3iV/uD8JLGJfYZIR7oGWFLP4uZ3tkneDfYzsFNBFp+A0oBEAC9ynZI9LU+uJkMeEJeJyQ/ 8VFkCJQPQZEsIGzOTlPnwvVna0AS86n2Z+rK7R/usYs5iJCZ55/JISWd8xD57ue0eB47bcJv VqGlObI2DEG8TwaW0O0duRhDgzMEL4t1KdRAepIESBEA/iPpI4gfUbVEIEQuqdqQyO4GAe+M kD0Hy5JH/0qgFmbaSegNTdQg5iqYjRZ3ttiswalql1/iSyv1WYeC1OAs+2BLOAT2NEggSiVO txEfgewsQtCWi8H1SoirakIfo45Hz0tk/Ad9ZWh2PvOGt97Ka85o4TLJxgJJqGEnqcFUZnJJ riwoaRIS8N2C8/nEM53jb1sH0gYddMU3QxY7dYNLIUrRKQeNkF30dK7V6JRH7pleRlf+wQcN fRAIUrNlatj9TxwivQrKnC9aIFFHEy/0mAgtrQShcMRmMgVlRoOA5B8RTulRLCmkafvwuhs6 dCxN0GNAORIVVFxjx9Vn7OqYPgwiofZ6SbEl0hgPyWBQvE85klFLZLoj7p+joDY1XNQztmfA rnJ9x+YV4igjWImINAZSlmEcYtd+xy3Li/8oeYDAqrsnrOjb+WvGhCykJk4urBog2LNtcyCj kTs7F+WeXGUo0NDhbd3Z6AyFfqeF7uJ3D5hlpX2nI9no/ugPrrTVoVZAgrrnNz0iZG2DVx46 x913pVKHl5mlYQARAQABwsFfBBgBAgAJBQJafgNKAhsMAAoJELKItV//nCLBwNIP/AiIHE8b oIqReFQyaMzxq6lE4YZCZNj65B/nkDOvodSiwfwjjVVE2V3iEzxMHbgyTCGA67+Bo/d5aQGj gn0TPtsGzelyQHipaUzEyrsceUGWYoKXYyVWKEfyh0cDfnd9diAm3VeNqchtcMpoehETH8fr RHnJdBcjf112PzQSdKC6kqU0Q196c4Vp5HDOQfNiDnTf7gZSj0BraHOByy9LEDCLhQiCmr+2 E0rW4tBtDAn2HkT9uf32ZGqJCn1O+2uVfFhGu6vPE5qkqrbSE8TG+03H8ecU2q50zgHWPdHM OBvy3EhzfAh2VmOSTcRK+tSUe/u3wdLRDPwv/DTzGI36Kgky9MsDC5gpIwNbOJP2G/q1wT1o Gkw4IXfWv2ufWiXqJ+k7HEi2N1sree7Dy9KBCqb+ca1vFhYPDJfhP75I/VnzHVssZ/rYZ9+5 1yDoUABoNdJNSGUYl+Yh9Pw9pE3Kt4EFzUlFZWbE4xKL/NPno+z4J9aWemLLszcYz/u3XnbO vUSQHSrmfOzX3cV4yfmjM5lewgSstoxGyTx2M8enslgdXhPthZlDnTnOT+C+OTsh8+m5tos8 HQjaPM01MKBiAqdPgksm1wu2DrrwUi6ChRVTUBcj6+/9IJ81H2P2gJk3Ls3AVIxIffLoY34E +MYSfkEjBz0E8CLOcAw7JIwAaeBT
  • Cc: Leon Romanovsky <leon@xxxxxxxxxx>, Abdiel Janulgue <abdiel.janulgue@xxxxxxxxx>, Alexander Potapenko <glider@xxxxxxxxxx>, Alex Gaynor <alex.gaynor@xxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxx>, Danilo Krummrich <dakr@xxxxxxxxxx>, iommu@xxxxxxxxxxxxxxx, Jason Wang <jasowang@xxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Joerg Roedel <joro@xxxxxxxxxx>, Jonathan Corbet <corbet@xxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, kasan-dev@xxxxxxxxxxxxxxxx, Keith Busch <kbusch@xxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, linux-nvme@xxxxxxxxxxxxxxxxxxx, linuxppc-dev@xxxxxxxxxxxxxxxx, linux-trace-kernel@xxxxxxxxxxxxxxx, Madhavan Srinivasan <maddy@xxxxxxxxxxxxx>, Masami Hiramatsu <mhiramat@xxxxxxxxxx>, Michael Ellerman <mpe@xxxxxxxxxxxxxx>, "Michael S. Tsirkin" <mst@xxxxxxxxxx>, Miguel Ojeda <ojeda@xxxxxxxxxx>, Robin Murphy <robin.murphy@xxxxxxx>, rust-for-linux@xxxxxxxxxxxxxxx, Sagi Grimberg <sagi@xxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Steven Rostedt <rostedt@xxxxxxxxxxx>, virtualization@xxxxxxxxxxxxxxx, Will Deacon <will@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Sat, 09 Aug 2025 16:53:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 8/9/25 09:34, Jason Gunthorpe wrote:
> On Fri, Aug 08, 2025 at 08:51:08PM +0200, Marek Szyprowski wrote:
>> First - basing the API on the phys_addr_t.
>>
>> Page based API had the advantage that it was really hard to abuse it and 
>> call for something that is not 'a normal RAM'. 
> 
> This is not true anymore. Today we have ZONE_DEVICE as a struct page
> type with a whole bunch of non-dram sub-types:
> 
> enum memory_type {
>       /* 0 is reserved to catch uninitialized type fields */
>       MEMORY_DEVICE_PRIVATE = 1,
>       MEMORY_DEVICE_COHERENT,
>       MEMORY_DEVICE_FS_DAX,
>       MEMORY_DEVICE_GENERIC,
>       MEMORY_DEVICE_PCI_P2PDMA,
> };
> 
> Few of which are kmappable/page_to_virtable() in a way that is useful
> for the DMA API.
> 
> DMA API sort of ignores all of this and relies on the caller to not
> pass in an incorrect struct page. eg we rely on things like the block
> stack to do the right stuff when a MEMORY_DEVICE_PCI_P2PDMA is present
> in a bio_vec.
> 
> Which is not really fundamentally different from just using
> phys_addr_t in the first place.
> 
> Sure, this was a stronger argument when this stuff was originally
> written, before ZONE_DEVICE was invented.
> 
>> I initially though that phys_addr_t based API will somehow simplify
>> arch specific implementation, as some of them indeed rely on
>> phys_addr_t internally, but I missed other things pointed by
>> Robin. Do we have here any alternative?
> 
> I think it is less of a code simplification, more as a reduction in
> conceptual load. When we can say directly there is no struct page type
> anyhwere in the DMA API layers then we only have to reason about
> kmap/phys_to_virt compatibly.
> 
> This is also a weaker overall requirement than needing an actual
> struct page which allows optimizing other parts of the kernel. Like we
> aren't forced to create MEMORY_DEVICE_PCI_P2PDMA stuct pages just to
> use the dma api.
> 
> Again, any place in the kernel we can get rid of struct page the
> smoother the road will be for the MM side struct page restructuring.
> 
> For example one of the bigger eventual goes here is to make a bio_vec
> store phys_addr_t, not struct page pointers.
> 
> DMA API is not alone here, we have been de-struct-paging the kernel
> for a long time now:
> 
> netdev: 
> https://lore.kernel.org/linux-mm/20250609043225.77229-1-byungchul@xxxxxx/
> slab: https://lore.kernel.org/linux-mm/20211201181510.18784-1-vbabka@xxxxxxx/
> iommmu: 
> https://lore.kernel.org/all/0-v4-c8663abbb606+3f7-iommu_pages_jgg@xxxxxxxxxx/
> page tables: 
> https://lore.kernel.org/linux-mm/20230731170332.69404-1-vishal.moola@xxxxxxxxx/
> zswap: 
> https://lore.kernel.org/all/20241216150450.1228021-1-42.hyeyoo@xxxxxxxxx/
> 
> With a long term goal that struct page only exists for legacy code,
> and is maybe entirely compiled out of modern server kernels.

Why just server kernels?  I suspect client systems actually run
newer kernels than servers do.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.