Xen project Mailing List

Re: [Xen-devel] [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

To: "Yang, Philip" <Philip.Yang@xxxxxxx>

From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>

Date: Fri, 1 Nov 2019 17:42:32 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GWZJ+xeas6XhzceIPpCCcy37V5dyZk91GfPxboiHBZQ=; b=PNWAr0aEOsXfbuvKGQjuqjggCkMd1+b+Tw0EGDc3x4DvpoZzBti3ycOrU1KeyzxCO9+Ka6zwBLi6TUwF5hmvtwkj534NqxeCcn4e7Qi6tF+CZVaxWiAsTd4myadfrih36/tVIXrNjslhrJyvZDZXR1FVaDChPXC+W0O6Mqx40YlEJKNXYYlftOxGuEvymmhQLJotgeZ6shFZdT1SxRhRIDwNIAN7MCw8gIoWiKl835un6Zt2vHgx1CKBuA724qMkcsMLRdE3OkJDuFPxBAob9Rj+NyvYitKkzMpDG/Y4GtDyg780vAgb/GK0TfYuNC0f+1qMF65dmxgXvVFL6BhsNw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bchm8DeQlCNYOXY/lj/ITNFi3RY6Y70l0brs3Kgf0Z/cj+5pqKYdzdx8gFxUpMoITPb+uwgELlQx5fH99P3HHv58Ydpu1x31aPK6gOA83rDB+tsQ9kJ9SpOszIszFOS945IWFcMba1fwVUIp/86rUqiAMJQnjHnm2DURV8iqUlY5QdIFhQKDEChKVAQpSyC5cfteDBXaq9ilrULSonXLv/DDTOAeKNrtvtYn3FEZOMTdvE9bsxMPZzU14eIOldheB/VZNddCO05NGoYF2G/5zbx5S/CjOQhm0Lxsq85BIaZJabjxFfSdzrwhCutWF4souHGuJMSs+VfYURlcbsVeAQ==

Authentication-results: spf=none (sender IP is ) smtp.mailfrom=jgg@xxxxxxxxxxxx;

Delivery-date: Fri, 01 Nov 2019 17:42:48 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHVjcvOUfhzqykxXkO0v7SQaQq3BKdyANqAgAAA3wCABGiEgIAAB7AAgAANJwCAABzBgA==

Thread-topic: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote: > > This test for range_blockable should be before mutex_lock, I can move > > it up > > > yes, thanks. Okay, I wrote it like this: if (mmu_notifier_range_blockable(range)) mutex_lock(&adev->notifier_lock); else if (!mutex_trylock(&adev->notifier_lock)) return false; > > Also, do you know if notifier_lock is held while calling > > amdgpu_ttm_tt_get_user_pages_done()? Can we add a 'lock assert held' > > to amdgpu_ttm_tt_get_user_pages_done()? > > gpu side hold notifier_lock but kfd side doesn't. kfd side doesn't check > amdgpu_ttm_tt_get_user_pages_done/mmu_range_read_retry return value but > check mem->invalid flag which is updated from invalidate callback. It > takes more time to change, I will come to another patch to fix it later. Ah.. confusing, OK, I'll let you sort that > > However, this is all pre-existing bugs, so I'm OK go ahead with this > > patch as modified. I advise AMD to make a followup patch .. > > > yes, I will. While you are here, this is also wrong: int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) { down_read(&mm->mmap_sem); r = hmm_range_fault(range, 0); up_read(&mm->mmap_sem); if (unlikely(r <= 0)) { if ((r == 0 || r == -EBUSY) && !time_after(jiffies, timeout)) goto retry; goto out_free_pfns; } for (i = 0; i < ttm->num_pages; i++) { pages[i] = hmm_device_entry_to_page(range, range->pfns[i]); It is not allowed to read the results of hmm_range_fault() outside locking, and in particular, we can't convert to a struct page. This must be done inside the notifier_lock, after checking mmu_range_read_retry(), all handling of the struct page must be structured like that. > >> @@ -997,10 +1004,18 @@ static void amdgpu_ttm_tt_unpin_userptr(struct > >> ttm_tt *ttm) > >> sg_free_table(ttm->sg); > >> > >> #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR) > >> - if (gtt->range && > >> - ttm->pages[0] == hmm_device_entry_to_page(gtt->range, > >> - gtt->range->pfns[0])) > >> - WARN_ONCE(1, "Missing get_user_page_done\n"); > >> + if (gtt->range) { > >> + unsigned long i; > >> + > >> + for (i = 0; i < ttm->num_pages; i++) { > >> + if (ttm->pages[i] != > >> + hmm_device_entry_to_page(gtt->range, > >> + gtt->range->pfns[i])) > >> + break; > >> + } > >> + > >> + WARN((i == ttm->num_pages), "Missing get_user_page_done\n"); > >> + } > > > > Is this related/necessary? I can put it in another patch if it is just > > debugging improvement? Please advise > > > I see this WARN backtrace now, but I didn't see it before. This is > somehow related. Hm, might be instructive to learn what is going on.. Thanks, Jason _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.