[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch-4.16] arm/smmuv1,v2: Protect smmu master list with a lock



On Thu, 28 Oct 2021, Julien Grall wrote:
> On Thu, 28 Oct 2021, 00:14 Stefano Stabellini, <sstabellini@xxxxxxxxxx> wrote:
>       On Wed, 27 Oct 2021, Julien Grall wrote:
>       > > > > > +    return ret;
>       > > > > >    }
>       > > > > >    static int register_smmu_master(struct arm_smmu_device 
> *smmu,
>       > > > > > @@ -2056,7 +2066,10 @@ static int arm_smmu_add_device(struct 
> device
>       > > > > > *dev)
>       > > > > >        } else {
>       > > > > >            struct arm_smmu_master *master;
>       > > > > > +        spin_lock(&arm_smmu_devices_lock);
>       > > > > >            master = find_smmu_master(smmu, dev->of_node);
>       > > > > > +        spin_unlock(&arm_smmu_devices_lock);
>       > > > >
>       > > > > At the moment, unlocking here is fine because we don't remove 
> the
>       > > > > device. However, there are a series to supporting removing a 
> device (see
>       > > > > [1]). So I think it would be preferable to unlock after the 
> last use of
>       > > > > 'cfg'.
>       > > > >
>       > > Ok. I will move unlocking to the end of this else {} block. I was 
> not aware
>       > > of the patch you are referring to.
>       >
>       > I think the end of the else is still too early. This needs to at 
> least be past
>       > iommu_group_set_iommudata() because we store cfg.
>       >
>       > Potentially, the lock wants to also englobe 
> arm_smmu_master_alloc_smes(). So I
>       > am wondering whether it would be simpler to hold the lock for the 
> whole
>       > duration of arm_smmu_add_device() (I can see value when we will want 
> to
>       > interlock with the remove code).
> 
>       The patch was to protect the smmu master list. From that point of view
>       the unlock right after find_smmu_master would be sufficient, right?
> 
> 
> Yes. However this is not fixing all the problems (see below).
> 
> 
>       We only need to protect cfg if we are worried that the same device is
>       added in two different ways at the same time. Did I get it right? If so,
>       I would say that that case should not be possible? Am I missing another
>       potential conflict?
> 
> 
> It should not be possible to add the same device twice. The problem is more 
> when we are going to remove the device. In this case, "master"
> may disappear at any point.
> 
> The support for removing device is not yet implemented in the tree. But there 
> is already a patch on the ML. So I think it would be
> shortsighted to only move the lock to just solve concurrent access to the 
> list.
 
That makes sense now: the other source of conflict is concurrent add and
remove of the same device. Sorry it wasn't clear to me before.
 
 
>       I am pointing this out for two reasons:
> 
>       Protecting the list is different from protecting each element from
>       concurrent modification of the element itself. If the latter is a
>       potential problem, I wonder if arm_smmu_add_device is the only function
>       affected?
> 
> 
> I had a brief looked at the code and couldn't find any other places where 
> this may be an issue.
> 
> 
>       The second reason is that extending the lock past
>       arm_smmu_master_alloc_smes is a bit worrying because it causes
>       &arm_smmu_devices_lock and smmu->stream_map_lock to nest, which wasn't
>       the case before.
> 
> 
> Nested locks are common. I don't believe there would be a problem here with 
> this one.
> 
> 
>       I am not saying that it is a bad idea to extend the lock past
>       arm_smmu_master_alloc_smes -- it might be the right thing to do.
> 
> 
> I don't usually suggest locking changes blindly ;).
> 
>       But I
> 
>       am merely saying that it might be best to think twice about it.
> 
>       and/or do
>       that after 4.16.
> 
> 
> To be honest, this patch is not useful the callback to register a
> device in the IOMMU subsystem. The sentence makes no sense sorry. I
> meant the add callback doesn't support PCI devices. So the locking is
> a latent issue at the moment.
>
> So if you are concerned with the my suggested locking then, I am
> afraid the current patch is a no-go on my side for 4.16.

I was totally fine with it aside from the last suggestion of extending
the spin_unlock until the end of the function because until then the
changes were obviously correct to me.

I didn't understand the reason why we needed extending spin_unlock, now
I understand it. Also lock nesting is one of those thing that it is
relatively common but I think should always take a second check to make
sure it is correct.  Specifically we need to check that no fuctions with
smmu->stream_map_lock taken call a function that take
&arm_smmu_devices_lock. It is not very difficult but I haven't done
this check myself.

The other thing that is not clear to me is whether we would need also to
protect places where we use (not allocate) masters and/or cfg, e.g.
arm_smmu_attach_dev, arm_smmu_domain_add_master.


> That said we can work towards a new locking approach for 4.17.
> However, I would want to have a proposal from your side or at least
> some details on why the suggested locking is not suitable.
 
The suggested locking approach up until the last suggestion looks
totally fine to me. The last suggestion is a bit harder to tell because
the PCI removal hook is not there yet, so I am having troubles seeing
exactly what needs to be protected.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.