[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [RFC] design: design doc for 1:1 direct-map


  • To: Julien Grall <julien@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>
  • From: Penny Zheng <Penny.Zheng@xxxxxxx>
  • Date: Thu, 10 Dec 2020 07:02:04 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6hH2bFeJ6U0WTSu6M+Xc88Fyo5heiPr2hAxOCd+RRBc=; b=JzD4bFm6rKFmFGL12A7JM+iBNgvUWppN3KCQ9gQbVju9ZbgMAnqA8rVjSWDoLM0lXkEJ3pKnj0BXN+fbl+q+BHp3/Vh9o+nhVwEtY+ZsHjaEUSVpL5uBRkv5EpxZs8rJEMxrplR7pfJJtgytE4mge4I/tgfIDr4wBLPwckJJJl9VAswGSGFH8sA1k8Mgjtts93x2v8Dfrsz1F9oxXsf8/pmk8TW8eBU/7fzloV9NZJiuFFmsgbZK6unE4OGhbMCX9U2NDDXpWZuQGKUixd6NRr38JIQ9cctyImVmE/Cp/ADhqiruv+ShapymolB9zCHXUzAuo1ug7VnJcc4i0bcA3g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PpFe9sp8F03RfrPUTKaWJFW868Uwt2z8KX8grZboyMx+nOB4s+QYpO9XKfAWl+3+XllnQA9HBklnbiUydlJyaGCIh7zFLYqx50ef91x8JexJ18yGwPK6olMRNPUubG7z+OhNgdlF4wG2FUt7G1X/eaBkKUgq0hYBOIqjn/PDKPRYn3WDE6kw44S8RhdtRIyFB+2CU5hx/9GpgWmi3/XT6nlMmYoaN97TxjffZBlHz1xsmmVQkIYQx8OmqInHjRnajQCvbVw5iREsBJXRtBiKfKrrbvd0fHsIsMG5ampd6LWb9zHst8eViFHj32bN20YPc8vjVNVYvHna+ax5G4bO4Q==
  • Authentication-results-original: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Kaly Xin <Kaly.Xin@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>, nd <nd@xxxxxxx>, Paul Durrant <paul@xxxxxxx>, "famzheng@xxxxxxxxxx" <famzheng@xxxxxxxxxx>
  • Delivery-date: Thu, 10 Dec 2020 07:02:36 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHWzSILZVLbUZur0UCHvW9hRaI4w6ns6LwAgAL3c8A=
  • Thread-topic: [RFC] design: design doc for 1:1 direct-map

Hi Julien

Thanks for the nice and detailed comments. (*^▽^*)
Here are the replies:

> -----Original Message-----
> From: Julien Grall <julien@xxxxxxx>
> Sent: Tuesday, December 8, 2020 5:07 PM
> To: Penny Zheng <Penny.Zheng@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> sstabellini@xxxxxxxxxx
> Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>; Kaly Xin
> <Kaly.Xin@xxxxxxx>; Wei Chen <Wei.Chen@xxxxxxx>; nd <nd@xxxxxxx>;
> Paul Durrant <paul@xxxxxxx>; famzheng@xxxxxxxxxx
> Subject: Re: [RFC] design: design doc for 1:1 direct-map
> 
> Hi Penny,
> 
> I am adding Paul and Zheng in the thread as there are similar interest for the
> x86 side.
> 
> On 08/12/2020 05:21, Penny Zheng wrote:
> > This is one draft design about the infrastructure for now, not ready
> > for upstream yet (hence the RFC tag), thought it'd be useful to
> > firstly start a discussion with the community.
> >
> > Create one design doc for 1:1 direct-map.
> > It aims to describe why and how we allocate 1:1 direct-map(guest
> > physical == physical) domains.
> >
> > This document is partly based on Stefano Stabellini's patch serie v1:
> > [direct-map DomUs](
> > https://lists.xenproject.org/archives/html/xen-devel/2020-
> 04/msg00707.html).
> 
> May I ask why a different approach?

In Stefano original design, he'd like to allocate 1:1 direct-map with 
user-defined
memory regions and he prefers allocating it from sub-domain allocator.

And it brings quite a discussion there and in the last, everyone kinds of all
agrees that it is not workable. Since if requested memory ever goes into any
allocators, no matter boot, or sub-domain allocator, we could not ensure that
before actually allocating it for one 1:1 direct-map domain, it will not be into
any other use.

So I'd prefer to split original design into two parts: one is here, that user 
only
wants to allocate one 1:1 direct-map domain, not caring about where the ram
will be located into. Think about dom0. Then, we could stick to allocate memory
still from sub-domain allocator.
 
Another part which I said in below commits,  "For the part regarding allocating 
1:1 direct- map domains with user-defined memory regions, it will be included
in next design of static memory allocation".

But of course, If a combination can make community to better understand our
ideas, We're willing to combine them in next version. 😉

Briefly speaking, if we allocating 1:1 direct-map domains with user-defined
memory regions, we need to reserve those memory regions in the beginning.

> >
> > Signed-off-by: Penny Zheng <penny.zheng@xxxxxxx>
> > ---
> > For the part regarding allocating 1:1 direct-map domains with
> > user-defined memory regions, it will be included in next design of
> > static memory allocation.
> 
> I don't think you can do without user-defined memory regions (see more
> below).
> 
> > ---
> >   docs/designs/1_1_direct-map.md | 87
> ++++++++++++++++++++++++++++++++++
> >   1 file changed, 87 insertions(+)
> >   create mode 100644 docs/designs/1_1_direct-map.md
> >
> > diff --git a/docs/designs/1_1_direct-map.md
> > b/docs/designs/1_1_direct-map.md new file mode 100644 index
> > 0000000000..ce3e2c77fd
> > --- /dev/null
> > +++ b/docs/designs/1_1_direct-map.md
> > @@ -0,0 +1,87 @@
> > +# Preface
> > +
> > +The document is an early draft for direct-map memory map (`guest
> > +physical == physical`) of domUs. And right now, it constrains to ARM
> 
> s/constrains/limited/
> 
> Aside the interface to the user, you should be able to re-use the same code
> on x86. Note that because the memory layout on x86 is fixed (always starting
> at 0), you would only be able to have only one direct-mapped domain.
> 

Sorry, I have little knowledge on x86. And it may need more investigation.

> > +architecture.
> > +
> > +It aims to describe why and how the guest would be created as direct-map
> domain.
> > +
> > +This document is partly based on Stefano Stabellini's patch serie v1:
> > +[direct-map DomUs](
> > +https://lists.xenproject.org/archives/html/xen-devel/2020-
> 04/msg00707.html).
> > +
> > +This is a first draft and some questions are still unanswered. When
> > +this is the case, the text shall contain XXX.
> > +
> > +# Introduction
> > +
> > +## Background
> > +
> > +Cases where domU needs direct-map memory map:
> > +
> > +  * IOMMU not present in the system.
> > +  * IOMMU disabled, since it doesn't cover a specific device.
> 
> If the device is not covered by the IOMMU, then why would you want to
> disable the IOMMUs for the rest of the system?
> 

This is a mixed scenario. We pass some devices to VM with SMMU, and we
pass other devices to VM without SMMU. We could not guarantee guest 
DMA security. 

So users may want to disable the SMMU, at least, they can gain some
performance improvement from SMMU disabled.

> > +  * IOMMU disabled, since it doesn't have enough bandwidth.
> 
> I am not sure to understand this one.
> 

In some SoC, there would be multiple devices connected to one SMMU.

In some extreme situation, multiple devices do DMA concurrency, The
translation requests can exceed SMMU's translation capacity. This will
cause DMA latency.

> > +  * IOMMU disabled, since it adds too much latency.
> 
> The list above sounds like direct-map memory would be necessary even
> without device-passthrough. Can you clarify it?
> 

Okay. 

SMMU on different SoCs can be implemented differently. For example, some
SoC vendor may remove the TLB inside SMMU.

In this case, the SMMU will add latency in DMA progress. Users may want to
disable the SMMU for some Realtime scenarios.

> > +
> > +*WARNING:
> > +Users should be careful that it is not always secure to assign a
> > +device without
> 
> s/careful/aware/ I think. Also, it is never secure to assign a device without
> IOMMU/SMMU unless you have a replacement.
> 
> I would suggest to reword it something like:
> 
> "When the device is not protected by the IOMMU, the administrator should
> make sure that:
>     - The device is assigned to a trusted guest
>     - You have an additional security mechanism on the platform (e.g
> MPU) to protect the memory."
> 

Thanks for the rephrase. (*^▽^*)

> > +IOMMU/SMMU protection.
> > +Users must be aware of this risk, that guests having access to
> > +hardware with DMA capacity must be trusted, or it could use the DMA
> > +engine to access any other memory area.
> > +Guests could use additional security hardware component like NOC,
> > +System MPU to protect the memory.
> 
> What's the NOC?
> 

Network on Chip. 

Some kind of SoC level firewall that limits the devices' DMA access range
or CPU memory access range.

> > +
> > +## Design
> > +
> > +The implementation may cover following aspects:
> > +
> > +### Native Address and IRQ numbers for GIC and UART(vPL011)
> > +
> > +Today, fixed addresses and IRQ numbers are used to map GIC and
> > +UART(vPL011) in DomUs. And it may cause potential clash on direct-map
> domains.
> > +So, Using native addresses and irq numbers for GIC, UART(vPL011), in
> > +direct-map domains is necessary.
> > +e.g.
> 
> To me e.g. means example. But below this is not an example, this is a
> requirement in order to use the vpl011 on system without pl011 UART.
>

Yes, right.
I'll delete e.g. here
 
> > +For the virtual interrupt of vPL011: instead of always using
> > +`GUEST_VPL011_SPI`, try to reuse the physical SPI number if possible.
> 
> How would you find the following region for guest using PV drivers;
>     - Event channel interrupt
>     - Grant table area
>
Good catch! thousand thx. 😉
 
We've done some investigation on this part. Correct me if I am wrong.

Pages like shared_info, grant table, etc, shared between guests and 
xen, are mapped by ARM guests using the hypercall HYPERVISOR_memory_op 
and always would not be directly mapped, even in dom0.

So, here, we suggest that maybe we could do some modification in the hypercall
to let it not only pass gfn to xen, but also receive already allocated 
mfns(e.g. grant
tables) from xen in direct map situation. 
But If so, it involves modification in linux, o(╥﹏╥)o.

And also we incline to keep all guest related pages(including ram,  grant 
tables,
etc) in one whole piece.

Right now, pages like grant tables are allocated separately in Xen heap, so 
don't
stand much chance to be consistent with the guest ram.
 
So what if we allocate more ram at first, such like, need 256MB, give it 257MB, 
let
extra 1MB used for those pages. Then if so, we could keep it as a whole.

This is my quite rough brainstorm, plz bear it and give me more thoughts on it.
 
> > +
> > +### Device tree option: `direct_map`
> > +
> > +Introduce a new device tree option `direct_map` for direct-map domains.
> > +Then, when users try to allocate one direct-map domain(except DOM0),
> > +`direct-map` property needs to be added under the appropriate
> `/chosen/domUx`.
> > +
> > +
> > +            chosen {
> > +                ...
> > +                domU1 {
> > +                    compatible = "xen, domain";
> > +                    #address-cells = <0x2>;
> > +                    #size-cells = <0x1>;
> > +                    direct-map;
> > +                    ...
> > +                };
> > +                ...
> > +            };
> > +
> > +If users are using imagebuilder, they can add to boot.source
> > +something like the
> 
> This documentations ounds like more something for imagebuilder rather
> than Xen itself.
> 

Yes, right. I'll delete this part.

> > +following:
> > +
> > +    fdt set /chosen/domU1 direct-map
> > +
> > +Users could also use `xl` to create direct-map domains, just use the
> > +following config option: `direct-map=true`
> > +
> > +### direct-map guest memory allocation
> > +
> > +Func `allocate_memory_direct_map` is based on `allocate_memory_11`,
> > +and shall be refined to allocate memory for all direct-map domains,
> including DOM0.
> > +Roughly speaking, firstly, it tries to allocate arbitrary memory
> > +chunk of requested size from domain
> > +sub-allocator(`alloc_domheap_pages`). If fail, split the chunk into
> > +halves, and re-try, until it succeed or bail out with the smallest chunk 
> > size.
> 
> If you have a mix of domain with direct-mapped and normal domain, you
> may end up to have the memory so small that your direct-mapped domain
> will have many small banks. This is going to be a major problem if you are
> creating the domain at runtime (you suggest xl can be used).
> 
> In addition, some users may want to be able to control the location of the
> memory as this reduced the amount of work in the guest (e.g you don't have
> to dynamically discover the memory).
> 
> I think it would be best to always require the admin to select the RAM bank
> used by a direct mapped domain. Alternatively, we could have a pool of
> memory that can only be used for direct mapped domain. This should limit
> the fragmentation of the memory.
>

Yep, in some cases, if we have mix of domains with direct-mapped with 
user- defined memory regions (scattering loosely)and normal domains at 
the beginning, it may fail when we later creating the domain at runtime (use 
xl), no matter direct-map domain or not.

But, users should be free to allocate where they want, we may not limit a
pool of memory to use.

Of course, we could add warning to let them being aware.

But I'm with you that it would be best to always require the admin to select
the RAM bank used by a direct mapped domain. 

Later, I will add this part design in my next series. 

And Just adding 1:1 direct-map without user-defined regions as an extra option 
here. 
 
> > +Then, `insert_11_bank` shall insert above allocated pages into a
> > +memory bank, which are ordered by address, and also set up guest P2M
> > +mapping(
> > +`guest_physmap_add_page`) to ensure `gfn == mfn`.
> 
> Cheers,
> 
> --
> Julien Grall

Cheers,

--
Penny

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.