[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] ITS emulation race conditions



Hi,

On 10/04/17 00:12, André Przywara wrote:
> Hi,
> 
> I wanted to run some ideas on how to prevent the race conditions we are
> facing with the ITS emulation and removing devices and/or LPIs.
> I think Stefano's idea of tagging a discarded LPI is the key, but still
> some details are left to be solved.
> I think we are dealing with two issues:
> 1) A guest's DISCARD can race with in incoming LPI.
> Ideally DISCARD would remove the host LPI -> vLPI connection (in the
> host_lpis table), so any new incoming (host) LPI would simply be
> discarded very early in gicv3_do_LPI() without ever resolving into a
> virtual LPI. Now while this removal is atomic, we could have just missed
> an incoming LPI, so the old virtual LPI would still traverse down the
> VGIC with a "doomed" virtual LPI ID.
> I wonder if that could be solved by a "crosswise" check:
> - The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then*
> removes the host_lpis connectin.
> - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag
> in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that).
> With this setup the DISCARD handler can assume that no new virtual LPI
> instances enter the VGIC afterwards.
> 
> 2) An unmapped LPI might still be in use by the VGIC, foremost might
> still be in an LR.
> Tagging the pending_irq should solve this in general. Whenever a VGIC
> function finds the UNMAPPED tag, it does not process the vIRQ, but
> retires it. For simplicity we might limit this to handling when a VCPU
> exists and an LR gets cleaned up: If we hit a tagged pending_irq here,
> we clean the LR, remove the pending_irq from all lists and signal the
> ITS emulation that this pending_irq is now ready to be removed (by
> calling some kind of cleanup_lpi() function, for instance).
> The ITS code can then remove the struct pending_irq from the radix tree.
> MAPD(V=0) is now using this to tag all still mapped events as
> "UNMAPPED", the counting down the "still alive" pending_irqs in
> cleanup_lpi() until they reach zero. At this point it should be safe to
> free the pend_irq array.
> 
> Does that sound like a plan?
> Do I miss something here? Probably yes, hence I am asking ;-)

So there are two issues with this:
- For doing the LPI cleanup, we would need to call a virtual ITS or
virtual LPI specific function directly from gic.c. This looks like bad
coding style, as it breaks the abstraction (calling a virtual LPI/ITS
specific function from within gic.c)

- If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while
this vLPI is still in an LR (but not cleaned up until both commands have
been handled), we end up with using the wrong pending_irq struct (the
new one). (I assume the UNMAPPED tag would be cleared upon the new MAPTI).

Can this create problems?
I see two possibilities:
a) the old vLPI is still pending in the LR: as the new LR would be
pristine, the code in update_one_lr() would just clean the QUEUED bit,
but not do anything further. The LR would be kept as used by this vLPI,
with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU
this would make the new vLPI pending.
b) the old vLPI has been EOIed: the new LR would be pristine, the code
in update_one_lr() would try to clean it up anyway, which should not
hurt, AFAICT.

So if there is no new LPI triggered, a) would be wrong, but b) right. Is
that correct so far?

Now if there is a new LPI, a) would be correct, though the priority
might be different (though I am not sure this is an issue). b) should
work anyway, since the cleanup code checks for a new pending condition,
so would (re-)inject the (new) vLPI.

Julien does not seem to be a fan of this tagging idea, as this might
create more subtle failures that we don't see at the moment (which I can
understand).

So I would like to know how to proceed here:
I) stick to the tag, and fix that particular case above by checking for
the "inflight && ENABLED" condition and clearing the LR if the
pending_irq does not state it's pending anymore
II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag,
where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING.
NO_LONGER_PENDING would be checked and cleared by update_one_lr(),
UNMAPPED by vgic_vcpu_inject_irq().
That would leave the issue how to call the pend_irq_cleanup() function
from update_one_lr() without breaking abstraction
III) Revert to the "check for NULL" solution.

MMh, this is nasty stuff, any suggestions?

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.