[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] xen/arm: fix gnttab_need_iommu_mapping

Hi Rahul,

On 11/02/2021 16:05, Rahul Singh wrote:
On 11 Feb 2021, at 1:52 pm, Julien Grall <julien@xxxxxxx> wrote:

On 11/02/2021 13:20, Rahul Singh wrote:
Hello Julien,

Hi Rahul,

On 10 Feb 2021, at 7:52 pm, Julien Grall <julien@xxxxxxx> wrote:

On 10/02/2021 18:08, Rahul Singh wrote:
Hello Julien,
On 10 Feb 2021, at 5:34 pm, Julien Grall <julien@xxxxxxx> wrote:


On 10/02/2021 15:06, Rahul Singh wrote:
On 9 Feb 2021, at 8:36 pm, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:

On Tue, 9 Feb 2021, Rahul Singh wrote:
On 8 Feb 2021, at 6:49 pm, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:

Commit 91d4eca7add broke gnttab_need_iommu_mapping on ARM.
The offending chunk is:

#define gnttab_need_iommu_mapping(d)                    \
-    (is_domain_direct_mapped(d) && need_iommu(d))
+    (is_domain_direct_mapped(d) && need_iommu_pt_sync(d))

On ARM we need gnttab_need_iommu_mapping to be true for dom0 when it is
directly mapped and IOMMU is enabled for the domain, like the old check
did, but the new check is always false.

In fact, need_iommu_pt_sync is defined as dom_iommu(d)->need_sync and
need_sync is set as:

   if ( !is_hardware_domain(d) || iommu_hwdom_strict )
       hd->need_sync = !iommu_use_hap_pt(d);

iommu_use_hap_pt(d) means that the page-table used by the IOMMU is the
P2M. It is true on ARM. need_sync means that you have a separate IOMMU
page-table and it needs to be updated for every change. need_sync is set
to false on ARM. Hence, gnttab_need_iommu_mapping(d) is false too,
which is wrong.

As a consequence, when using PV network from a domU on a system where
IOMMU is on from Dom0, I get:

(XEN) smmu: /smmu@fd800000: Unhandled context fault: fsr=0x402, 
iova=0x8424cb148, fsynr=0xb0001, cb=0
[   68.290307] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK

The fix is to go back to something along the lines of the old
implementation of gnttab_need_iommu_mapping.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>
Fixes: 91d4eca7add
Backport: 4.12+


Given the severity of the bug, I would like to request this patch to be
backported to 4.12 too, even if 4.12 is security-fixes only since Oct

For the 4.12 backport, we can use iommu_enabled() instead of
is_iommu_enabled() in the implementation of gnttab_need_iommu_mapping.

Changes in v2:
- improve commit message
- add is_iommu_enabled(d) to the check
xen/include/asm-arm/grant_table.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/grant_table.h 
index 6f585b1538..0ce77f9a1c 100644
--- a/xen/include/asm-arm/grant_table.h
+++ b/xen/include/asm-arm/grant_table.h
@@ -89,7 +89,7 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t 
    (((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])

#define gnttab_need_iommu_mapping(d)                    \
-    (is_domain_direct_mapped(d) && need_iommu_pt_sync(d))
+    (is_domain_direct_mapped(d) && is_iommu_enabled(d))

#endif /* __ASM_GRANT_TABLE_H__ */

I tested the patch and while creating the guest I observed the below warning 
from Linux for block device.

So you are creating a guest with "xl create" in dom0 and you see the
warnings below printed by the Dom0 kernel? I imagine the domU has a
virtual "disk" of some sort.
Yes you are right I am trying to create the guest with "xl create” and before 
that, I created the logical volume and trying to attach the logical volume
block to the domain with “xl block-attach”. I observed this error with the "xl 
block-attach” command.
This issue occurs after applying this patch as what I observed this patch 
introduce the calls to iommu_legacy_{, un}map() to map the grant pages for
IOMMU that touches the page-tables. I am not sure but what I observed is that 
something is written wrong when iomm_unmap calls unmap the pages
because of that issue is observed.

Can you clarify what you mean by "written wrong"? What sort of error do you see 
in the iommu_unmap()?
I might be wrong as per my understanding for ARM we are sharing the P2M between 
CPU and IOMMU always and the map_grant_ref() function is written in such a way 
that we have to call iommu_legacy_{, un}map() only if P2M is not shared.

map_grant_ref() will call the IOMMU if gnttab_need_iommu_mapping() returns 
true. I don't really see where this is assuming the P2M is not shared.

In fact, on x86, this will always be false for HVM domain (they support both 
shared and separate page-tables).

As we are sharing the P2M when we call the iommu_map() function it will overwrite the 
existing GFN -> MFN ( For DOM0 GFN is same as MFN) entry and when we call 
iommu_unmap() it will unmap the  (GFN -> MFN ) entry from the page-table.
AFAIK, there should be nothing mapped at that GFN because the page belongs to 
the guest. At worse, we would overwrite a mapping that is the same.
Sorry I should have mention before backend/frontend is dom0 in this
case and GFN is mapped. I am trying to attach the block device to DOM0

Ah, your log makes a lot more sense now. Thank you for the clarification!

So yes, I agree that iommu_{,un}map() will do the wrong thing if the frontend 
and backend in the same domain.

I don't know what the state in Linux, but from Xen PoV it should be possible to 
have the backend/frontend in the same domain.

I think we want to ignore the IOMMU mapping request when the domain is the 
same. Can you try this small untested patch:

I tested the patch and it is working fine for both dom0/domU. I am able to 
attach the block device to dom0/domu.
Also I didn’t observe the IOMMU fault also for block device that we have behind 
IOMMU on our system and attached to domU.

Thanks for the testing. I noticed that my patch doesn't build because arm_iommu_unmap_page() doesn't have a parameter mfn. Can you confirm whether you had to replace mfn with _mfn(dfn_x(dfn))?


Julien Grall



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.