[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch-4.16] arm/smmuv1,v2: Protect smmu master list with a lock



On Thu, 28 Oct 2021, Julien Grall wrote:
> Hi Stefano,
> 
> First apologies for sending the previous e-mails in HTML (thanks for pointing
> that out!).
> 
> On 28/10/2021 01:20, Stefano Stabellini wrote:
> > On Thu, 28 Oct 2021, Julien Grall wrote:
> > > On Thu, 28 Oct 2021, 00:14 Stefano Stabellini, <sstabellini@xxxxxxxxxx>
> > > wrote:
> > >        On Wed, 27 Oct 2021, Julien Grall wrote:
> > >        > > > > > +    return ret;
> > >        > > > > >    }
> > >        > > > > >    static int register_smmu_master(struct arm_smmu_device
> > > *smmu,
> > >        > > > > > @@ -2056,7 +2066,10 @@ static int
> > > arm_smmu_add_device(struct device
> > >        > > > > > *dev)
> > >        > > > > >        } else {
> > >        > > > > >            struct arm_smmu_master *master;
> > >        > > > > > +        spin_lock(&arm_smmu_devices_lock);
> > >        > > > > >            master = find_smmu_master(smmu, dev->of_node);
> > >        > > > > > +        spin_unlock(&arm_smmu_devices_lock);
> > >        > > > >
> > >        > > > > At the moment, unlocking here is fine because we don't
> > > remove the
> > >        > > > > device. However, there are a series to supporting removing
> > > a device (see
> > >        > > > > [1]). So I think it would be preferable to unlock after the
> > > last use of
> > >        > > > > 'cfg'.
> > >        > > > >
> > >        > > Ok. I will move unlocking to the end of this else {} block. I
> > > was not aware
> > >        > > of the patch you are referring to.
> > >        >
> > >        > I think the end of the else is still too early. This needs to at
> > > least be past
> > >        > iommu_group_set_iommudata() because we store cfg.
> > >        >
> > >        > Potentially, the lock wants to also englobe
> > > arm_smmu_master_alloc_smes(). So I
> > >        > am wondering whether it would be simpler to hold the lock for the
> > > whole
> > >        > duration of arm_smmu_add_device() (I can see value when we will
> > > want to
> > >        > interlock with the remove code).
> > > 
> > >        The patch was to protect the smmu master list. From that point of
> > > view
> > >        the unlock right after find_smmu_master would be sufficient, right?
> > > 
> > > 
> > > Yes. However this is not fixing all the problems (see below).
> > > 
> > > 
> > >        We only need to protect cfg if we are worried that the same device
> > > is
> > >        added in two different ways at the same time. Did I get it right?
> > > If so,
> > >        I would say that that case should not be possible? Am I missing
> > > another
> > >        potential conflict?
> > > 
> > > 
> > > It should not be possible to add the same device twice. The problem is
> > > more when we are going to remove the device. In this case, "master"
> > > may disappear at any point.
> > > 
> > > The support for removing device is not yet implemented in the tree. But
> > > there is already a patch on the ML. So I think it would be
> > > shortsighted to only move the lock to just solve concurrent access to the
> > > list.
> >   That makes sense now: the other source of conflict is concurrent add and
> > remove of the same device. Sorry it wasn't clear to me before.
> At the moment, we are relying on the upper layer (e.g. PCI or DT subsystem) to
> prevent concurrent add/remove/assignment. The trouble is we don't have a
> common lock between PCI and DT.
> 
> One possibility would be to add a common in the uper layer, but it feels to me
> this is a bit fragile and may also require longer locking section than
> necessary.
> 
> That said, add/remove/assignment operations are meant to be rare. So this is
> could be an option. This would also have the advantage to cover all the
> IOMMUs.
> 
> >     
> > >        I am pointing this out for two reasons:
> > > 
> > >        Protecting the list is different from protecting each element from
> > >        concurrent modification of the element itself. If the latter is a
> > >        potential problem, I wonder if arm_smmu_add_device is the only
> > > function
> > >        affected?
> > > 
> > > 
> > > I had a brief looked at the code and couldn't find any other places where
> > > this may be an issue.
> > > 
> > > 
> > >        The second reason is that extending the lock past
> > >        arm_smmu_master_alloc_smes is a bit worrying because it causes
> > >        &arm_smmu_devices_lock and smmu->stream_map_lock to nest, which
> > > wasn't
> > >        the case before.
> > > 
> > > 
> > > Nested locks are common. I don't believe there would be a problem here
> > > with this one.
> > > 
> > > 
> > >        I am not saying that it is a bad idea to extend the lock past
> > >        arm_smmu_master_alloc_smes -- it might be the right thing to do.
> > > 
> > > 
> > > I don't usually suggest locking changes blindly ;).
> > > 
> > >        But I
> > > 
> > >        am merely saying that it might be best to think twice about it.
> > > 
> > >        and/or do
> > >        that after 4.16.
> > > 
> > > 
> 
> [...]
> 
> > The other thing that is not clear to me is whether we would need also to
> > protect places where we use (not allocate) masters and/or cfg, e.g.
> > arm_smmu_attach_dev, arm_smmu_domain_add_master.
> 
> I think both should be with the lock. Now the question is will other IOMMUs
> driver requires the same locking?
> 
> If yes, then maybe that locking should be in the common code.
> 
> > > That said we can work towards a new locking approach for 4.17.
> > > However, I would want to have a proposal from your side or at least
> > > some details on why the suggested locking is not suitable.
> >   The suggested locking approach up until the last suggestion looks
> > totally fine to me. The last suggestion is a bit harder to tell because
> > the PCI removal hook is not there yet, so I am having troubles seeing
> > exactly what needs to be protected.
> 
> The PCI removal hook is the same as the platform device one. There are already
> a patch on the ML (see [1]) for that.
> 
> We have two interlocking problem to resolve:
>   1) Concurrent request between PCI and platform/DT subsystem
>   2) Removal vs add vs (re)assign
> 
> The two approaches I can think of are:
> 
> Approach A:
>   - The driver is responsible to protect against 1)
>   - Each subsystem (DT and PCI) are responsible for 2)
> 
> Approach B:
>   The driver is responsible to protect for 1) 2).
> 
> From my understanding, the proposed patch for Michal is following approach A
> whilst my proposal is going towards approach B.
> 
> I am open to use approach A, however I think this needs to be documented as
> the lock to use will depend on whether the device is a PCI device or not.

Thanks for the explanation, now everything is a lot clearer. I don't
have feedback on approach A vs. B -- it looks like both could work well.

In regards to this specific patch and also the conversation about 4.16
or 4.17: I think it would be fine to take this patch in 4.16 in its
current form. Although it is not required because PCI passthrough is
not going to be complete in 4.16 anyway, I like that this patch makes
the code consistent in terms of protection of rbtree accesses.  With
this patch the arm_smmu_master rbtree is consistently protected from
concurrent accesses. Without this patch, it is sometimes protected and
sometimes not, which is not great.

So I think that is something that could be good to have in 4.16. But
like you said, the patch is not strictly required so it is fine either
way.

Other changes on top of it, e.g. a complete implementation of Approach A
or Approach B, I think it would be best to target 4.17 so that we can
evaluate them together with the other outstanding PCI patches. I think
it would make the review a lot easier (at least for me.)

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.