[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V1] iommu/arm: Add Renesas IPMMU-VMSA support

To: Oleksandr <olekstysh@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
From: Julien Grall <julien.grall@xxxxxxx>
Date: Tue, 23 Jul 2019 14:36:54 +0100
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, sstabellini@xxxxxxxxxx
Delivery-date: Tue, 23 Jul 2019 13:37:04 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Oleksandr,

On 22/07/2019 17:27, Oleksandr wrote:

On 20.07.19 21:25, Julien Grall wrote:

Apologies for the late answer. I have been traveling for the past few weeks.


Absolutely no problem. Thank you for your review.


On 6/26/19 11:30 AM, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>

The IPMMU-VMSA is VMSA-compatible I/O Memory Management Unit (IOMMU)
which provides address translation and access protection functionalities
to processing units and interconnect networks.


Do you have a link to the specification?


All I have is a TRM. Unfortunately, I can't share it.


Does anyone in the community has access to the spec?

+      hardware supports stage 2 translation table format and is able to use
+      CPU's P2M table as is.
+
  endif
diff --git a/xen/drivers/passthrough/arm/Makefileb/xen/drivers/passthrough/arm/Makefile
index b3efcfd..40ac7a9 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1,2 +1,3 @@
  obj-y += iommu.o
  obj-$(CONFIG_ARM_SMMU) += smmu.o
+obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.cb/xen/drivers/passthrough/arm/ipmmu-vmsa.c
new file mode 100644
index 0000000..5091c61
--- /dev/null
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -0,0 +1,1487 @@
+/*
+ * xen/drivers/passthrough/arm/ipmmu-vmsa.c
+ *
+ * Driver for the Renesas IPMMU-VMSA found in R-Car Gen3 SoCs.
+ *
+ * The IPMMU-VMSA is VMSA-compatible I/O Memory Management Unit (IOMMU)
+ * which provides address translation and access protection functionalities
+ * to processing units and interconnect networks.
+ *
+ * Please note, current driver is supposed to work only with newest Gen3 SoCs
+ * revisions which IPMMU hardware supports stage 2 translation table format and
+ * is able to use CPU's P2M table as is.
+ *
+ * Based on Linux's IPMMU-VMSA driver from Renesas BSP:
+ *    drivers/iommu/ipmmu-vmsa.c
What are the major differences compare the Linux driver?
Well, the major differences are:
1. Stage 1/Stage 2 translation. Linux driver supports Stage 1 translation only(with Stage 1 translation table format). It manages page table by itself. ButXen driver supports Stage 2 translation (with Stage 2 translation table format)to be able to share the page table with the CPU. Stage 1 translation is alwaysbypassed in Xen driver.
So, Xen driver is supposed to be used with newest Gen3 SoC revisions only (H3ES3.0, M3 ES3.0, etc.) which IPMMU hardware does support stage 2 translationtable format.
2. AArch64 support. Linux driver uses VMSAv8-32 mode, while Xen driver enablesArmv8 VMSAv8-64 mode to cover up to 40 bit input address.
3. Context bank (sets of page table) usage. In Xen, each context bank is mappedto one Xen domain. So, all devices being pass throughed to the same Xen domainshare the same context bank.

Can this be written in the commit message? This is helpful for anyone reviewingthe driver today and future developer.

+ * you can found at:
+ *    url: git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
+ *    branch: v4.14.75-ltsi/rcar-3.9.2
+ *    commit: a5266d298124874c2c06b8b13d073f6ecc2ee355
Is there any reason to use the BSP driver and not the one provided by Linuxdirectly?
I was thinking the BSP driver is a *little bit* more updated than Linux one.Sometime it was a big difference between mainline and BSP driver. But now
the difference is not big and mostly in DDR_BACKUP and WHITELIST support. Ilooked at mainline driver as well when implementing Xen driver.

What is the review process for patches to be merged in the BSP? Is it the sameas Linux upstream?



[...]

+#define dev_archdata(dev) ((struct ipmmu_vmsa_xen_device *)dev->archdata.iommu)
+
+/* Root/Cache IPMMU device's information */
+struct ipmmu_vmsa_device
+{

AFAICT, this file was converted to Xen coding style. If so, the style forstruct is


struct ... {
...
};


Yes, will correct everywhere in this file.

+    struct device *dev;
+    void __iomem *base;
+    struct ipmmu_vmsa_device *root;
+    struct list_head list;
+    unsigned int num_utlbs;
+    unsigned int num_ctx;
+    spinlock_t lock;    /* Protects ctx and domains[] */
+    DECLARE_BITMAP(ctx, IPMMU_CTX_MAX);
+    struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
+};
+
+/*
+ * Root/Cache IPMMU domain's information
+ *
+ * Root IPMMU device is assigned to Root IPMMU domain while Cache IPMMU device
+ * is assigned to Cache IPMMU domain. Master devices are connected to Cache
+ * IPMMU devices through specific ports called micro-TLBs.
+ * All Cache IPMMU devices, in turn, are connected to Root IPMMU device
+ * which manages IPMMU context.
+ */
+struct ipmmu_vmsa_domain
+{


Ditto.

ok

+    /*
+     * IPMMU device assigned to this IPMMU domain.
+     * Either Root device which is located at the main memory bus domain or
+     * Cache device which is located at each hierarchy bus domain.
+     */
+    struct ipmmu_vmsa_device *mmu;
+
+    /* Context used for this IPMMU domain */
+    unsigned int context_id;
+
+    /* Xen domain associated with this IPMMU domain */
+    struct domain *d;
+
+    /* The fields below are used for Cache IPMMU domain only */
+
+    /*
+     * Used to keep track of the master devices which are attached to this
+     * IPMMU domain (domain users). Master devices behind the same IPMMU device
+     * are grouped together by putting into the same IPMMU domain.
+     * Only when the refcount reaches 0 this IPMMU domain can be destroyed.
+     */
+    int refcount;


If the refcount cannot be 0, then I would prefer an unsigned int here.


Hrm, I meant cannot be negative.


It can be >= 0.


Ok, so please switch to unsigned int here please.

+{
+    return readl(mmu->base + offset);
+}
+
+static void ipmmu_write(struct ipmmu_vmsa_device *mmu, unsigned int offset,
+                        u32 data)
+{
+    writel(data, mmu->base + offset);
+}
+
+static u32 ipmmu_ctx_read_root(struct ipmmu_vmsa_domain *domain,
+                               unsigned int reg)
+{
+    return ipmmu_read(domain->mmu->root,
+                      domain->context_id * IM_CTX_SIZE + reg);
+}
+
+static void ipmmu_ctx_write_root(struct ipmmu_vmsa_domain *domain,
+                                 unsigned int reg, u32 data)
+{
+    ipmmu_write(domain->mmu->root,
+                domain->context_id * IM_CTX_SIZE + reg, data);
+}
+
+static void ipmmu_ctx_write_cache(struct ipmmu_vmsa_domain *domain,
+                                  unsigned int reg, u32 data)
+{
+    ASSERT(reg == IMCTR);

What's the rationale of passing reg in parameter if it can only be equal toIMCTR?

Good question. I tried to retain the same interface as foripmmu_ctx_write_root(_all) for visibility.

Cache IPMMU device has other than IMCTR context registers, but they are not usedby this driver.


Could the function be able to deal with those other registers without any 
change?


Shall I drop reg parameter?


Let's discuss about it. See my question above.

+
+    /* Mask fields which are implemented in IPMMU-MM only. */
+    if ( !ipmmu_is_root(domain->mmu) )
+        ipmmu_write(domain->mmu, domain->context_id * IM_CTX_SIZE + reg,
+                    data & IMCTR_COMMON_MASK);
+}
[...]
+/* Enable MMU translation for the micro-TLB. */
+static void ipmmu_utlb_enable(struct ipmmu_vmsa_domain *domain,
+                              unsigned int utlb)
+{
+    struct ipmmu_vmsa_device *mmu = domain->mmu;
+
+    /*
+     * TODO: Reference-count the micro-TLB as several bus masters can be
+     * connected to the same micro-TLB.
+     */
What would be the exact problem if this is not handled? Could a utlb sharedbetween multiple domain?
Actually, this TODO as well as the whole "TLB and micro-TLB Management" code Itried to retain as much as possible.
The large amount of devices have unique micro-TLB connection, but there is acase, when not. At the moment, I don't see any problem if these potentialdevices (which share the same utlb) are assigned to the same Xen domain.
In this case we would just enable the same utlb more than once at the domaincreation time (assign a device) and disable the same utlb more than once at thedomain destruction time (deassign a device).

That's correct as long as we don't to support device hotplug. Platform devicehotplug is not going to happen, this may happen for PCI devices.

The worst case scenario would be when these devices are assigned to differentXen domains. So, I think, the same utlb *can't* be shared between multiple Xendomains, since it points to the context bank to use for the page walk.

Thank you for the explanation. What can actually happen? Could it lead to asecurity issue (e.g the IPMMU is bypassed)?

Also, the question is whether this is worth to try to implement it. Do we havecases where devices use the same micro-TLB but assigned to different domains?

If not, then maybe you could just add check in the driver to prevent that usecases. The work around the iommu_group done by Paul [1] might be useful.

Anyway, from upstream perspective this is not a massive concern for now asplatform device-passthrough is not security supported. So I would be happy ifthe TODO is addressed in a follow-up series.


[...]

+/* Master devices management */
+static int ipmmu_attach_device(struct ipmmu_vmsa_domain *domain,
+                               struct device *dev)
+{
+    struct ipmmu_vmsa_master_cfg *cfg = dev_archdata(dev)->cfg;
+    struct ipmmu_vmsa_device *mmu = cfg->mmu;
+    unsigned int i;
+
+    if ( !mmu )
+    {
+        dev_err(dev, "Cannot attach to IPMMU\n");
+        return -ENXIO;
+    }
+
+    if ( !domain->mmu )


So you read domain->mmu here and ...

+    {
+        /* The domain hasn't been used yet, initialize it. */
+        domain->mmu = mmu;
+
+        /*
+         * We have already enabled context for Root IPMMU assigned to this
+         * Xen domain in ipmmu_domain_init_context().
+         * Enable the context for Cache IPMMU only. Flush the TLB as required
+         * when modifying the context registers.
+         */
+        ipmmu_ctx_write_cache(domain, IMCTR,

+ ipmmu_ctx_read_root(domain, IMCTR) |IMCTR_FLUSH);

+
+        dev_info(dev, "Using IPMMU context %u\n", domain->context_id);
+    }
+    else if ( domain->mmu != mmu )


... here. What actually promise that domain->mmu can't change in parallel?


ipmmu_attach_device is protected by xen_domain->lock

I find confusing to rely on xen_domain->lock to serialize access to a field froma different structure. It would be good if this is written in the document ofthe structure.


Also is this always read behind the same lock?

[...]

The IOMMU interface in Xen has not been designed with the new IOMMU bindingsin mind. I would prefer if we look for extending add_device callback tosupport platform device.
This would allow to probe the device later on and therefore avoid to gothrough the device-tree multiple.
I completely agree with you that current implementation is not optimal andshould be reworked in order not to scan the whole DT many times, but I am notcompletely understand what we should do and how exactly.
Could you, please, add more details?

It would be good to have an abstract way to add new device to IOMMU based on thegeneric IOMMU DT binding. I am quite keen to seen something similar toiommu_fwspec in Xen so this can be used for both DT and ACPI.

From an high level perspective, we would have some code add a new device to theIOMMU. The generic code would:

   1) Parse the binding and prepare iommu_fwspec with the correct information
   2) Call the IOMMU driver to register the new device

The new function would be either called from handle_device or a new loop overthe DT nodes.

+
+static void ipmmu_device_reset(struct ipmmu_vmsa_device *mmu)
+{
+    unsigned int i;
+
+    /* Disable all contexts. */
+    for ( i = 0; i < mmu->num_ctx; ++i )
+        ipmmu_write(mmu, i * IM_CTX_SIZE + IMCTR, 0);
+}
+
+/*
+ * This function relies on the fact that Root IPMMU device is being probed
+ * the first. If not the case, it denies further Cache IPMMU device probes
+ * (returns the -ENODEV) until the Root IPMMU device has been registered
+ * for sure.
Can we look at handling -EDEFER in Xen instead?
I am not sure this is something we should implement at this stage (while onlyIPMMU driver would be a user). I have already resolved that possible issue bytrying to locate a Root IPMMU device and probe it the first
to avoid the case described above. So now, we don't depend on how IPMMU devicesare located in DT. Please, see ipmmu_init(). So, I tend to live with it some time.

The reason I asked the question is the current solution feels like papering overan API that does not fit for the new driver. So it would be worth investigatingwhether a -EDEFER like could be easily used in Xen.


[...]


----------

Julien, what we should do with the fact that IPMMU supports only 3-level pagetable?

I left a TODO regarding that, but we need to work out some usable solution ifpossible.


         /*
          * As 4-level translation table is not supported in IPMMU, we need
          * to check IPA size used for P2M table beforehand to be sure it is
          * 3-level and the IPMMU will be able to use it.
          *
          * In case of using 4KB page granule we should use two concatenated
          * translation tables at level 1 in order to support 40 bit IPA
          * with 3-level translation table.
          *
          * TODO: Probably, when determing the "pa_range" in setup_virt_paging()
          * we should take into the account the IPMMU ability as well.
          */
         if ( IPMMU_MAX_P2M_IPA_BITS < p2m_ipa_bits )
         {

dev_err(&node->dev, "P2M IPA size is not supported (P2M=%uIPMMU=%u)!\n",

                     p2m_ipa_bits, IPMMU_MAX_P2M_IPA_BITS);
             return -EOPNOTSUPP;
         }

We have similar problem with the SMMU. The only strict requirement for the IOMMUis to have a valid p2m_ipa_bits at the time a domain is built. Note that theSMMU will store the value when probing the SMMU, but that could be reworked.

So rather than initializing the P2M first and then the IOMMU, I would firstinitialize the IOMMU so we can gather the requirements and then initialize the P2M.

In the P2M code, you can take into account the IOMMU requirements and furtherrestrict if necessary. What do you think?


Cheers,

[1] <20190716101657.23327-1-paul.durrant@xxxxxxxxxx>

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH V1] iommu/arm: Add Renesas IPMMU-VMSA support
  - From: Oleksandr

References:
- Re: [Xen-devel] [PATCH V1] iommu/arm: Add Renesas IPMMU-VMSA support
  - From: Julien Grall
- Re: [Xen-devel] [PATCH V1] iommu/arm: Add Renesas IPMMU-VMSA support
  - From: Oleksandr

Prev by Date: Re: [Xen-devel] [PATCH 2/2] xen: merge temporary vcpu pinning scenarios
Next by Date: Re: [Xen-devel] [PATCH] docs/sphinx: todo/wishlist
Previous by thread: Re: [Xen-devel] [PATCH V1] iommu/arm: Add Renesas IPMMU-VMSA support
Next by thread: Re: [Xen-devel] [PATCH V1] iommu/arm: Add Renesas IPMMU-VMSA support
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.