Xen project Mailing List

Re: xen/evtchn: Interrupt for port 34, but apparently not enabled; per-user 00000000a86a4c1b on 5.10

On 14.12.20 22:25, Julien Grall wrote:

Hi Juergen,

When testing Linux 5.10 dom0, I could reliably hit the following warningwith using event 2L ABI:

[ 589.591737] Interrupt for port 34, but apparently not enabled;per-user 00000000a86a4c1b[ 589.593259] WARNING: CPU: 0 PID: 1111 at/home/ANT.AMAZON.COM/jgrall/works/oss/linux/drivers/xen/evtchn.c:170evtchn_interrupt+0xeb/0x100

[  589.595514] Modules linked in:

[ 589.596145] CPU: 0 PID: 1111 Comm: qemu-system-i38 Tainted: GW 5.10.0+ #180[ 589.597708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOSrel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014

[  589.599782] RIP: e030:evtchn_interrupt+0xeb/0x100

[ 589.600698] Code: 48 8d bb d8 01 00 00 ba 01 00 00 00 be 1d 00 00 00e8 d9 10 ca ff eb b2 8b 75 20 48 89 da 48 c7 c7 a8 31 3d 82 e8 65 29 a0ff <0f> 0b e9 42 ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f

[  589.604087] RSP: e02b:ffffc90040003e70 EFLAGS: 00010086

[ 589.605102] RAX: 0000000000000000 RBX: ffff888102091800 RCX:0000000000000027[ 589.606445] RDX: 0000000000000000 RSI: ffff88817fe19150 RDI:ffff88817fe19158[ 589.607790] RBP: ffff88810f5ab980 R08: 0000000000000001 R09:0000000000328980[ 589.609134] R10: 0000000000000000 R11: ffffc90040003c70 R12:ffff888107fd3c00[ 589.610484] R13: ffffc90040003ed4 R14: 0000000000000000 R15:ffff88810f5ffd80[ 589.611828] FS: 00007f960c4b8ac0(0000) GS:ffff88817fe00000(0000)knlGS:0000000000000000

[  589.613348] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 589.614525] CR2: 00007f17ee72e000 CR3: 000000010f5b6000 CR4:0000000000050660

[  589.615874] Call Trace:
[  589.616402]  <IRQ>
[  589.616855]  __handle_irq_event_percpu+0x4e/0x2c0
[  589.617784]  handle_irq_event_percpu+0x30/0x80
[  589.618660]  handle_irq_event+0x3a/0x60
[  589.619428]  handle_edge_irq+0x9b/0x1f0
[  589.620209]  generic_handle_irq+0x4f/0x60
[  589.621008]  evtchn_2l_handle_events+0x160/0x280
[  589.621913]  __xen_evtchn_do_upcall+0x66/0xb0
[  589.622767]  __xen_pv_evtchn_do_upcall+0x11/0x20
[  589.623665]  asm_call_irq_on_stack+0x12/0x20
[  589.624511]  </IRQ>
[  589.624978]  xen_pv_evtchn_do_upcall+0x77/0xf0
[  589.625848]  exc_xen_hypervisor_callback+0x8/0x10

This can be reproduced when creating/destroying guest in a loop.Although, I have struggled to reproduce it on a vanilla Xen.


After several hours of debugging, I think I have found the root cause.

While we only expect the unmask to happen when the event channel isEOIed, there is an unmask happening as part of handle_edge_irq() becausethe interrupt was seen as pending by another vCPU (IRQS_PENDING is set).

It turns out that the event channel is set for multiple vCPU is incpu_evtchn_mask. This is happening because the affinity is not clearedwhen freeing an event channel.

The implementation of evtchn_2l_handle_events() will look for all theactive interrupts for the current vCPU and later on clear the pendingbit (via the ack() callback). IOW, I believe, this is not an atomicoperation.

Even if Xen will notify the event to a single vCPU, evtchn_pending_selmay still be set on the other vCPU (thanks to a different eventchannel). Therefore, there is a chance that two vCPUs will try to handlethe same interrupt.

The IRQ handler handle_edge_irq() is able to deal with that and willmask/unmask the interrupt. This will mess us with the lateeoi logic(although, I managed to reproduce it once without XSA-332).

My initial idea to fix the problem was to switch the affinity from CPU Xto CPU0 when the event channel is freed.

However, I am not sure this is enough because I haven't found anythingyet preventing a race between evtchn_2l_handle_events9) andevtchn_2l_bind_vcpu().

So maybe we want to introduce a refcounting (if there is nothingprovided by the IRQ framework) and only unmask when the counter drop to 0.


Any opinions?

With the two attached patches testing on my side survived more than 2 hours of constant guest reboots and destroy/create loops. Without the patches the WARN()s came up after less than one minute. Can you please give it a try? Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.