[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers



On Mon 25-06-18 10:01:03, Michal Hocko wrote:
> On Fri 22-06-18 16:09:06, Felix Kuehling wrote:
> > On 2018-06-22 11:24 AM, Michal Hocko wrote:
> > > On Fri 22-06-18 17:13:02, Christian König wrote:
> > >> Hi Michal,
> > >>
> > >> [Adding Felix as well]
> > >>
> > >> Well first of all you have a misconception why at least the AMD graphics
> > >> driver need to be able to sleep in an MMU notifier: We need to sleep 
> > >> because
> > >> we need to wait for hardware operations to finish and *NOT* because we 
> > >> need
> > >> to wait for locks.
> > >>
> > >> I'm not sure if your flag now means that you generally can't sleep in MMU
> > >> notifiers any more, but if that's the case at least AMD hardware will 
> > >> break
> > >> badly. In our case the approach of waiting for a short time for the 
> > >> process
> > >> to be reaped and then select another victim actually sounds like the 
> > >> right
> > >> thing to do.
> > > Well, I do not need to make the notifier code non blocking all the time.
> > > All I need is to ensure that it won't sleep if the flag says so and
> > > return -EAGAIN instead.
> > >
> > > So here is what I do for amdgpu:
> > 
> > In the case of KFD we also need to take the DQM lock:
> > 
> > amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr ->
> > kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch
> > 
> > So we'd need to pass the blockable parameter all the way through that
> > call chain.
> 
> Thanks, I have missed that part. So I guess I will start with something
> similar to intel-gfx and back off when the current range needs some
> treatment. So this on top. Does it look correct?
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index d138a526feff..e2d422b3eb0b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
> mmu_notifier *mn,
>               struct amdgpu_mn_node *node;
>               struct amdgpu_bo *bo;
>  
> +             if (!blockable) {
> +                     amdgpu_mn_read_unlock();
> +                     return -EAGAIN;
> +             }
> +
>               node = container_of(it, struct amdgpu_mn_node, it);
>               it = interval_tree_iter_next(it, start, end);

Ble, just noticed that half of the change didn't get to git index...
This is what I have
commit c4701b36ac2802b903db3d05cf77c030fccce3a8
Author: Michal Hocko <mhocko@xxxxxxxx>
Date:   Mon Jun 25 15:24:03 2018 +0200

    fold me
    
    - amd gpu notifiers can sleep deeper in the callchain 
(evict_process_queues_cpsch
      on a lock and amdgpu_mn_invalidate_node on unbound timeout) make sure
      we bail out when we have an intersecting range for starter

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d138a526feff..3399a4a927fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -225,6 +225,11 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct 
mmu_notifier *mn,
        while (it) {
                struct amdgpu_mn_node *node;
 
+               if (!blockable) {
+                       amdgpu_mn_read_unlock(rmn);
+                       return -EAGAIN;
+               }
+
                node = container_of(it, struct amdgpu_mn_node, it);
                it = interval_tree_iter_next(it, start, end);
 
@@ -266,6 +271,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
mmu_notifier *mn,
                struct amdgpu_mn_node *node;
                struct amdgpu_bo *bo;
 
+               if (!blockable) {
+                       amdgpu_mn_read_unlock(rmn);
+                       return -EAGAIN;
+               }
+
                node = container_of(it, struct amdgpu_mn_node, it);
                it = interval_tree_iter_next(it, start, end);
 
-- 
Michal Hocko
SUSE Labs

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.