Xen project Mailing List

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated

Date: Mon, 16 Sep 2019 15:20:09 -0700

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Joao Martins <joao.m.martins@xxxxxxxxxx>, "DONGLI.ZHANG" <dongli.zhang@xxxxxxxxxx>

Delivery-date: Mon, 16 Sep 2019 22:20:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 9/16/19 1:01 AM, Jan Beulich wrote: > On 13.09.2019 18:38, Joe Jin wrote: >> Hi Jan, >> >> Thanks for your reply, see my reply in line please. >> >> On 9/13/19 12:14 AM, Jan Beulich wrote: >>> On 12.09.2019 20:03, Joe Jin wrote: >>>> With below testcase, guest kernel reported "No irq handler for vector": >>>> 1). Passthrough mlx ib VF to 2 pvhvm guests. >>>> 2). Start rds-stress between 2 guests. >>>> 3). Scale down 2 guests vcpu from 32 to 6 at the same time. >>>> >>>> Repeat above test several iteration, guest kernel reported "No irq handler >>>> for vector", and IB traffic downed to zero which caused by interrupt lost. >>>> >>>> When vcpu offline, kernel disabled local IRQ, migrate IRQ to other cpu, >>>> update MSI-X table, enable IRQ. If any new interrupt arrived after >>>> local IRQ disabled also before MSI-X table been updated, interrupt still >>>> used old vector and dest cpu info, and when local IRQ enabled again, >>>> interrupt been sent to wrong cpu and vector. >>>> >>>> Looks sync PIR to IRR after MSI-X been updated is help for this issue. >>> >>> I'm having trouble making the connection, which quite possibly simply >>> means the description needs to be further extended: Sync-ing PIR to >>> IRR has nothing to do with a vector change. It would help if nothing >>> else caused this bitmap propagation, and an interrupt was lost (rather >>> than delivered through the wrong vector, or to the wrong CPU). >>> Furthermore with vector and destination being coupled, after a CPU has >>> been offlined this would generally mean >>> - if there was just a single destination permitted, lack of delivery >>> altogether, >>> - if there were multiple destinations permitted, delivery to one of >>> the other CPUs, at which point the vector would still be valid. >> >> When cpu offline on guest kernel, it only migrates IRQs which affinity set >> to the cpu only, if multiple destinations, kernel does not do migration >> which included update msi-x table with new destination also vector. >> >> After IRQ migration, kernel will check all vector's IRR, if APIC IRR >> been set, retrigger the IRQ to new destination. This intend to avoid >> to lost any interrupt. >> >> But on Xen, after msi-x table updated, it never tried to check and notify >> guest kernel there was pending IRQ. >> >>> >>> An interesting aspect would be on which CPU the log message was >>> observed, and how this correlates with the destination sets of the >>> CPUs that have got offlined. From there it would then further be >>> interesting to understand why the interrupt made it to that CPU, >>> since - as said - destination and vector get changed together, and >>> hence with things going wrong it would be of interest to know whether >>> the CPU receiving the IRQ is within the new destination set, or some >>> (random?) other one. >> >> irq_retrigger() been called after kernel updated vector, irq_retrigger() >> will send pending IRQ(s) to new destination. >> >> Here are kernel log when issue happened, guest kernel is 4.1, on 4.14 >> guest, it's almost same, but no "(irq -1)" for kernel changes, IRQ >> migrations workflow are same(fixup_irqs()): >> >> Sep 12 20:26:46 localhost kernel: smpboot: CPU 17 is now offline >> Sep 12 20:26:46 localhost kernel: smpboot: CPU 18 is now offline >> Sep 12 20:26:46 localhost kernel: smpboot: CPU 19 is now offline >> Sep 12 20:26:47 localhost kernel: Broke affinity for irq 251 >> Sep 12 20:26:47 localhost kernel: do_IRQ: 20.178 No irq handler for vector >> (irq -1) >> Sep 12 20:26:47 localhost kernel: smpboot: CPU 20 is now offline >> Sep 12 20:26:47 localhost kernel: smpboot: CPU 21 is now offline >> >> From above, you can see IRQ sent to cpu 20, which is offlining. >> >> IRQ arrived to the cpu immediately when IRQ enabled, after CPU offlined, >> it prints log "smpboot: CPU 20 is now offline". >> >> Call path in kernel as below: >> cpu_down() >> |-> cpu_down_maps_locked() >> | _cpu_down >> | |-> __stop_machine >> | |-> stop_cpus() >> | |->__stop_cpus() >> | |- queue_stop_cpus_work() ---+ >> |-> __cpu_die() | >> |-> pr_info("CPU %u is now offline\n", cpu); | >> +--------------------------------------------------+ >> | >> + >> multi_cpu_stop() >> |-> local_save_flags >> |-> take_cpu_down() >> | |-> __cpu_disable >> | |-> smp_ops.cpu_disable = xen_cpu_disable >> | |-> cpu_disable_common >> | |-> fixup_irqs <== IRQ migration. >> |-> local_irq_restore() > > Ah yes, this makes sense. You want to extend the description to > reflect some of the further explanation above. I will add more explanation later. > >>>> --- a/xen/drivers/passthrough/io.c >>>> +++ b/xen/drivers/passthrough/io.c >>>> @@ -412,6 +412,9 @@ int pt_irq_create_bind( >>>> pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec; >>>> pirq_dpci->gmsi.gflags = gflags; >>>> } >>>> + >>>> + if ( hvm_funcs.sync_pir_to_irr ) >>>> + >>>> hvm_funcs.sync_pir_to_irr(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]); >>> >>> If the need for this change can be properly explained, then it >>> still wants converting to alternative_vcall() - the the other >>> caller of this hook. Or perhaps even better move vlapic.c's >>> wrapper (suitably renamed) into hvm.h, and use it here. >> >> Yes I agree, I'm not 100% sure, so I set it to RFC. > > And btw, please also attach a brief comment here, to clarify > why the syncing is needed precisely at this point. > >>> Additionally, the code setting pirq_dpci->gmsi.dest_vcpu_id >>> (right after your code insertion) allows for the field to be >>> invalid, which I think you need to guard against. >> >> I think you means multiple destination, then it's -1? > > The reason for why it might be -1 are irrelevant here, I think. > You need to handle the case both to avoid an out-of-bounds > array access and to make sure an IRR bit wouldn't still get > propagated too late in some special case. Add following checks? if ( dest_vcpu_id >= 0 && dest_vcpu_id < d->max_vcpus && d->vcpu[dest_vcpu_id]->runstate.state <= RUNSTATE_blocked ) > > Also - what about the respective other path in the function, > dealing with PT_IRQ_TYPE_PCI and PT_IRQ_TYPE_MSI_TRANSLATE? It > seems to me that there's the same chance of deferring IRR > propagation for too long? This is possible, can you please help on how to get which vcpu associate the IRQ? I did not found any helper on current Xen. Thanks, Joe _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.