[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772

On 30/06/2011 16:49, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:

>>>> On 30.06.11 at 16:21, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
>> On 30/06/2011 13:43, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote:
>>> That's likely not the one, but rather 23573:584c2e5e03d9. And
>>> indeed it seems like the assertion is a stale leftover from the
>>> original non-RCU version of the patch. There are a few more
>>> similar ones which may similarly be candidates fro removal.
>>> Keir, what's your take on this?
>> Not sure, pirq_spin_lock_irq_desc() has a comment about the event_lock
>> preventing the PIRQ-IRQ mapping from changing under its feet. Why would the
>> radix-tree patch change what code is protected by event_lock, anyway?
> The whole function (including the comment) got added by that patch.
> Hence either comment and assertion need fixing, or both need to
> stay and calling code needs adjustment.

Well, I guess at least it's good that you wrote it, and recently. We're not
dealing with anyone else's hidden assumptions in that case.

> The RCU-ness, as I understand it, allows read accesses to the
> PIRQ -> IRQ mapping to be done lockless, hence d->event_lock
> needs to be held only if the intention is to alter the mapping
> (which in particular isn't the case when unmasking an IRQ). Or
> did I still not get my RCU thinking right?

We're nearly there. *Yes*, it is now safe to look up pirq structs in the
pirq_tree with no lock held. And the resulting pirq struct can safely be
accessed basically until you might yield (i.e., do softirq work). This is
safe because pirq structs are freed after an RCU safety period.

*However*, you still need to worry about concurrency aspects of access to
the contents of the pirq structure. *In particular*, pirq->arch.irq could
apparently be modified concurrently with the execution of
pirq_spin_lock_irq_desc() -- the modifying CPU holds both d->event_lock and
pirq->arch.irq's desc_lock, but pirq_spin_lock_irq_desc() may hold neither.

Note that domain_spin_lock_irq_desc() has a retry loop for a reason! It
knows that pirq-irq mapping may change under its feet, so it needs to
re-check the mapping with the desc_lock held, at which point the mapping
cannot change *if* it obtained the correct desc_lock in time!

Perhaps pirq_spin_lock_irq_desc() needs a similar retry loop? Perhaps
pirq_spin_lock_irq_desc() should never have been forked from
domain_spin_lock_irq_desc(), and all callers should simply use the former?

 -- Keir

> Jan

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.