[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mapping memory into a domain


  • To: Alejandro Vallejo <agarciav@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xenia Ragiadakou <Xenia.Ragiadakou@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Demi Marie Obenour <demiobenour@xxxxxxxxx>
  • Date: Fri, 9 May 2025 14:14:27 -0400
  • Autocrypt: addr=demiobenour@xxxxxxxxx; keydata= xsFNBFp+A0oBEADffj6anl9/BHhUSxGTICeVl2tob7hPDdhHNgPR4C8xlYt5q49yB+l2nipd aq+4Gk6FZfqC825TKl7eRpUjMriwle4r3R0ydSIGcy4M6eb0IcxmuPYfbWpr/si88QKgyGSV Z7GeNW1UnzTdhYHuFlk8dBSmB1fzhEYEk0RcJqg4AKoq6/3/UorR+FaSuVwT7rqzGrTlscnT DlPWgRzrQ3jssesI7sZLm82E3pJSgaUoCdCOlL7MMPCJwI8JpPlBedRpe9tfVyfu3euTPLPx wcV3L/cfWPGSL4PofBtB8NUU6QwYiQ9Hzx4xOyn67zW73/G0Q2vPPRst8LBDqlxLjbtx/WLR 6h3nBc3eyuZ+q62HS1pJ5EvUT1vjyJ1ySrqtUXWQ4XlZyoEFUfpJxJoN0A9HCxmHGVckzTRl 5FMWo8TCniHynNXsBtDQbabt7aNEOaAJdE7to0AH3T/Bvwzcp0ZJtBk0EM6YeMLtotUut7h2 Bkg1b//r6bTBswMBXVJ5H44Qf0+eKeUg7whSC9qpYOzzrm7+0r9F5u3qF8ZTx55TJc2g656C 9a1P1MYVysLvkLvS4H+crmxA/i08Tc1h+x9RRvqba4lSzZ6/Tmt60DPM5Sc4R0nSm9BBff0N m0bSNRS8InXdO1Aq3362QKX2NOwcL5YaStwODNyZUqF7izjK4QARAQABzTxEZW1pIE1hcmll IE9iZW5vdXIgKGxvdmVyIG9mIGNvZGluZykgPGRlbWlvYmVub3VyQGdtYWlsLmNvbT7CwXgE EwECACIFAlp+A0oCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJELKItV//nCLBhr8Q AK/xrb4wyi71xII2hkFBpT59ObLN+32FQT7R3lbZRjVFjc6yMUjOb1H/hJVxx+yo5gsSj5LS 9AwggioUSrcUKldfA/PKKai2mzTlUDxTcF3vKx6iMXKA6AqwAw4B57ZEJoMM6egm57TV19kz PMc879NV2nc6+elaKl+/kbVeD3qvBuEwsTe2Do3HAAdrfUG/j9erwIk6gha/Hp9yZlCnPTX+ VK+xifQqt8RtMqS5R/S8z0msJMI/ajNU03kFjOpqrYziv6OZLJ5cuKb3bZU5aoaRQRDzkFIR 6aqtFLTohTo20QywXwRa39uFaOT/0YMpNyel0kdOszFOykTEGI2u+kja35g9TkH90kkBTG+a EWttIht0Hy6YFmwjcAxisSakBuHnHuMSOiyRQLu43ej2+mDWgItLZ48Mu0C3IG1seeQDjEYP tqvyZ6bGkf2Vj+L6wLoLLIhRZxQOedqArIk/Sb2SzQYuxN44IDRt+3ZcDqsPppoKcxSyd1Ny 2tpvjYJXlfKmOYLhTWs8nwlAlSHX/c/jz/ywwf7eSvGknToo1Y0VpRtoxMaKW1nvH0OeCSVJ itfRP7YbiRVc2aNqWPCSgtqHAuVraBRbAFLKh9d2rKFB3BmynTUpc1BQLJP8+D5oNyb8Ts4x Xd3iV/uD8JLGJfYZIR7oGWFLP4uZ3tkneDfYzsFNBFp+A0oBEAC9ynZI9LU+uJkMeEJeJyQ/ 8VFkCJQPQZEsIGzOTlPnwvVna0AS86n2Z+rK7R/usYs5iJCZ55/JISWd8xD57ue0eB47bcJv VqGlObI2DEG8TwaW0O0duRhDgzMEL4t1KdRAepIESBEA/iPpI4gfUbVEIEQuqdqQyO4GAe+M kD0Hy5JH/0qgFmbaSegNTdQg5iqYjRZ3ttiswalql1/iSyv1WYeC1OAs+2BLOAT2NEggSiVO txEfgewsQtCWi8H1SoirakIfo45Hz0tk/Ad9ZWh2PvOGt97Ka85o4TLJxgJJqGEnqcFUZnJJ riwoaRIS8N2C8/nEM53jb1sH0gYddMU3QxY7dYNLIUrRKQeNkF30dK7V6JRH7pleRlf+wQcN fRAIUrNlatj9TxwivQrKnC9aIFFHEy/0mAgtrQShcMRmMgVlRoOA5B8RTulRLCmkafvwuhs6 dCxN0GNAORIVVFxjx9Vn7OqYPgwiofZ6SbEl0hgPyWBQvE85klFLZLoj7p+joDY1XNQztmfA rnJ9x+YV4igjWImINAZSlmEcYtd+xy3Li/8oeYDAqrsnrOjb+WvGhCykJk4urBog2LNtcyCj kTs7F+WeXGUo0NDhbd3Z6AyFfqeF7uJ3D5hlpX2nI9no/ugPrrTVoVZAgrrnNz0iZG2DVx46 x913pVKHl5mlYQARAQABwsFfBBgBAgAJBQJafgNKAhsMAAoJELKItV//nCLBwNIP/AiIHE8b oIqReFQyaMzxq6lE4YZCZNj65B/nkDOvodSiwfwjjVVE2V3iEzxMHbgyTCGA67+Bo/d5aQGj gn0TPtsGzelyQHipaUzEyrsceUGWYoKXYyVWKEfyh0cDfnd9diAm3VeNqchtcMpoehETH8fr RHnJdBcjf112PzQSdKC6kqU0Q196c4Vp5HDOQfNiDnTf7gZSj0BraHOByy9LEDCLhQiCmr+2 E0rW4tBtDAn2HkT9uf32ZGqJCn1O+2uVfFhGu6vPE5qkqrbSE8TG+03H8ecU2q50zgHWPdHM OBvy3EhzfAh2VmOSTcRK+tSUe/u3wdLRDPwv/DTzGI36Kgky9MsDC5gpIwNbOJP2G/q1wT1o Gkw4IXfWv2ufWiXqJ+k7HEi2N1sree7Dy9KBCqb+ca1vFhYPDJfhP75I/VnzHVssZ/rYZ9+5 1yDoUABoNdJNSGUYl+Yh9Pw9pE3Kt4EFzUlFZWbE4xKL/NPno+z4J9aWemLLszcYz/u3XnbO vUSQHSrmfOzX3cV4yfmjM5lewgSstoxGyTx2M8enslgdXhPthZlDnTnOT+C+OTsh8+m5tos8 HQjaPM01MKBiAqdPgksm1wu2DrrwUi6ChRVTUBcj6+/9IJ81H2P2gJk3Ls3AVIxIffLoY34E +MYSfkEjBz0E8CLOcAw7JIwAaeBT
  • Cc: Xen developer discussion <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 09 May 2025 18:13:57 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 5/9/25 5:47 AM, Alejandro Vallejo wrote:
>>>>>> 2. It can grab the *current* location of the pages and register an
>>>>>>    MMU notifier.  This works for GPU memory and file-backed memory.
>>>>>>    However, when the invalidate_range function of this callback, the
>>>>>>    driver *must* stop all further accesses to the pages.
>>>>>>
>>>>>>    The invalidate_range callback is not allowed to block for a long
>>>>>>    period of time.  My understanding is that things like dirty page
>>>>>>    writeback are blocked while the callback is in progress.  My
>>>>>>    understanding is also that the callback is not allowed to fail.
>>>>>>    I believe it can return a retryable error but I don’t think that
>>>>>>    it is allowed to keep failing forever.
>>>>>>
>>>>>>    Linux’s grant table driver actually had a bug in this area, which
>>>>>>    led to deadlocks.  I fixed that a while back.
>>>>>>
>>>>>> KVM implements the second option: it maps pages into the stage-2
>>>>>> page tables (or shadow page tables, if that is chosen) and unmaps
>>>>>> them when the invalidate_range callback is called.
> 
> I'm still lost as to what is where, who initiates what and what the end
> goal is. Is this about using userspace memory in dom0, and THEN sharing
> that with guests for as long as its live? And make enough magic so the
> guests don't notice the transitionary period in which there may not be
> any memory?
> 
> Or is this about using domU memory for the driver living in dom0?
> 
> Or is this about something else entirely?
> 
> For my own education. Is the following sequence diagram remotely accurate?
> 
> dom0                              domU
>  |                                  |
>  |---+                              |
>  |   | use gfn3 in the driver       |
>  |   | (mapped on user thread)      |
>  |<--+                              |
>  |                                  |
>  |  map mfn(gfn3) in domU BAR       |
>  |--------------------------------->|
>  |                              +---|
>  |              happily use BAR |   |
>  |                              +-->|
>  |---+                              |
>  |   | mmu notifier for gfn3        |
>  |   | (invalidate_range)           |
>  |<--+                              |
>  |                                  |
>  |  unmap mfn(gfn3)                 |
>  |--------------------------------->| <--- Plus some means to making guest 
>  |---+                          +---|      vCPUs pause on access.
>  |   | reclaim gfn3    block on |   |
>  |<--+                 access   |   |
>  |                              |   |
>  |---+                          |   |
>  |   | use gfn7 in the driver   |   |
>  |   | (mapped on user thread)  |   |
>  |<--+                          |   |
>  |                              |   |
>  |  map mfn(gfn7) in domU BAR   |   |
>  |------------------------------+-->| <--- Unpause blocked domU vCPUs
>  |                                  |

I believe this is accurate, yes.

>>>> - The switch from “emulated MMIO” to “MMIO or real RAM” needs to
>>>>   be atomic from the guest’s perspective.
>>>
>>> Updates of p2m PTEs are always atomic.
>> That’s good.
> 
> Updates to a single PTE are atomic, sure. But mapping/unmapping sizes
> not congruent with a whole superpage size (i.e: 256 KiB, more than a
> page, less than a superpage) wouldn't be, as far as the guest is
> concerned.
> 
> But if my understanding above is correct maybe it doesn't matter? It
> only needs to be atomic wrt the hypercall that requests it, so that the
> gfn is never reused while the guest p2m still holds that mfn.

I believe you are correct.  The only requirement is that the guest behaves
correctly if its page faults race against what is happening in the backend
domain.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.