[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

To: Julien Grall <julien@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx
From: Jürgen Groß <jgross@xxxxxxxx>
Date: Mon, 8 Feb 2021 14:58:11 +0100
Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, stable@xxxxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Wei Liu <wei.liu@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, Jakub Kicinski <kuba@xxxxxxxxxx>
Delivery-date: Mon, 08 Feb 2021 13:58:23 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.02.21 14:09, Julien Grall wrote:

Hi Juergen,

On 08/02/2021 12:31, Jürgen Groß wrote:

On 08.02.21 13:16, Julien Grall wrote:

On 08/02/2021 12:14, Jürgen Groß wrote:
On 08.02.21 11:40, Julien Grall wrote:
Hi Juergen,

On 08/02/2021 10:22, Jürgen Groß wrote:
On 08.02.21 10:54, Julien Grall wrote:
... I don't really see how the difference matter here. The ideais to re-use what's already existing rather than trying tore-invent the wheel with an extra lock (or whatever we can come up).
The difference is that the race is occurring _before_ any IRQ is
involved. So I don't see how modification of IRQ handling would help.
Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:

if ( irq in progress )
{
   set IRQS_PENDING
   return;
}

do
{
   clear IRQS_PENDING
   handle_irq()
} while (IRQS_PENDING is set)

IRQ handling flow like handle_fasteoi_irq() looks like:

if ( irq in progress )
   return;

handle_irq()
The latter flow would catch "spurious" interrupt and ignore them.So it would handle nicely the race when changing the event affinity.
Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
issued, thus having the same problem again?
Sorry I can't parse this.


handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
this condition being reset again in order to be able to process another
IRQ?

It is reset after the handler has been called. See handle_irq_event().


Right. And for us this is too early, as we want the next IRQ being
handled only after we have called xen_irq_lateeoi().

I believe this will be the case before our "lateeoi" handling is
becoming active (more precise: when our IRQ handler is returning to
handle_fasteoi_irq()), resulting in the possibility of the same race we
are experiencing now.

I am a bit confused what you mean by "lateeoi" handling is becomingactive. Can you clarify?


See above: the next call of the handler should be allowed only after
xen_irq_lateeoi() for the IRQ has been called.

If the handler is being called earlier we have the race resulting
in the WARN() splats.

Note that are are other IRQ flows existing. We should have a look atthem before trying to fix thing ourself.


Fine with me, but it either needs to fit all use cases (interdomain,
IPI, real interrupts) or we need to have a per-type IRQ flow.

I think we should fix the issue locally first, then we can start to do
a thorough rework planning. Its not as if the needed changes with the
current flow would be so huge, and I'd really like to have a solution
rather sooner than later. Changing the IRQ flow might have other side
effects which need to be excluded by thorough testing.

Although, the other issue I can see so far is handle_irq_for_port() willupdate info->{eoi_cpu, irq_epoch, eoi_time} without any locking. But itis not clear this is what you mean by "becoming active".


As long as a single event can't be handled on multiple cpus at the same
time, there is no locking needed.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Follow-Ups:
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall

References:
- [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Juergen Gross
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Jürgen Groß
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Jürgen Groß
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Jürgen Groß
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Jürgen Groß
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Jürgen Groß
- Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
  - From: Julien Grall

Prev by Date: [ovmf test] 159136: tolerable trouble: pass/starved - PUSHED
Next by Date: Re: [PATCH v2 3/3] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()
Previous by thread: Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
Next by thread: Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.