[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs



> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: 09 March 2020 16:29
> To: paul@xxxxxxx
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Varad Gautam <vrd@xxxxxxxxx>; Julien 
> Grall <julien@xxxxxxx>; Roger
> Pau Monné <roger.pau@xxxxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for 
> shared pirqs
> 
> On 06.03.2020 17:02, paul@xxxxxxx wrote:
> > From: Varad Gautam <vrd@xxxxxxxxx>
> >
> > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> > In that scenario, it is possible to receive multiple __pirq_guest_unbind
> > calls for the same pirq from domain_kill, if the pirq has not yet been
> > removed from the domain's pirq_tree, as:
> >   domain_kill()
> >     -> domain_relinquish_resources()
> >       -> pci_release_devices()
> >         -> pci_clean_dpci_irq()
> >           -> pirq_guest_unbind()
> >             -> __pirq_guest_unbind()
> >
> > For a shared pirq (nr_guests > 1), the first call would zap the current
> > domain from the pirq's guests[] list, but the action handler is never freed
> > as there are other guests using this pirq. As a result, on the second call,
> > __pirq_guest_unbind searches for the current domain which has been removed
> > from the guests[] list, and hits a BUG_ON.
> >
> > Make __pirq_guest_unbind safe to be called multiple times by letting xen
> > continue if a shared pirq has already been unbound from this guest. The
> > PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> > in complete_domain_destroy anyway.
> >
> > Signed-off-by: Varad Gautam <vrd@xxxxxxxxx>
> > [taking over from Varad at v4]
> > Signed-off-by: Paul Durrant <paul@xxxxxxx>
> > ---
> > Cc: Jan Beulich <jbeulich@xxxxxxxx>
> > Cc: Julien Grall <julien@xxxxxxx>
> > Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >
> > Roger suggested cleaning the entry from the domain pirq_tree so that
> > we need not make it safe to re-call __pirq_guest_unbind(). This seems like
> > a reasonable suggestion but the semantics of the code are almost
> > impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also
> > the name of struct so you generally have little idea what it actally means)
> > so I prefer to stick with a small fix that I can actually reason about.
> >
> > v4:
> >  - Re-work the guest array search to make it clearer
> 
> I.e. there are cosmetic differences to v3 (see below), but
> technically it's still the same. I can't believe the re-use
> of "pirq" for different entities is this big of a problem.

Please suggest code if you think it ought to be done differentely. I tried.

> But anyway:
> 
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind(
> >
> >      BUG_ON(!(desc->status & IRQ_GUEST));
> >
> > -    for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
> > -        continue;
> > -    BUG_ON(i == action->nr_guests);
> > +    for ( i = 0; i < action->nr_guests; i++ )
> > +        if ( action->guest[i] == d )
> > +            break;
> > +
> > +    if ( i == action->nr_guests ) /* No matching entry */
> > +    {
> > +        /*
> > +         * In case the pirq was shared, unbound for this domain in an 
> > earlier
> > +         * call, but still existed on the domain's pirq_tree, we still 
> > reach
> > +         * here if there are any later unbind calls on the same pirq. 
> > Return
> > +         * if such an unbind happens.
> > +         */
> > +        ASSERT(action->shareable);
> > +        return NULL;
> > +    }
> > +
> > +    ASSERT(action->nr_guests > 0);
> 
> This seems pointless to have here - v3 had it inside the if(),
> where it would actually guard against coming here with nr_guests
> equal to zero.

Why. The code just after this decrements nr_guests so it seems like entirely 
the right point to have the ASSERT. I can make it ASSERT >= 0, if that makes 
more sense.

> v3 also used if() and BUG() instead of ASSERT()
> inside this if(), which to me would seem more in line with our
> current ./CODING_STYLE guidelines of handling unexpected
> conditions. Could you clarify why you switched things?
> 

Because I don't think there is need to kill the host in a non-debug context if 
we hit this condition; it is entirely survivable as far as I can tell so a 
BUG_ON() did not seem appropriate.

  Paul



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.