[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] pass-through: sync pir to irr after msix vector been updated



Hi Jan,

Thanks for your reply, see my reply in line please.

On 9/13/19 12:14 AM, Jan Beulich wrote:
> On 12.09.2019 20:03, Joe Jin wrote:
>> With below testcase, guest kernel reported "No irq handler for vector":
>>   1). Passthrough mlx ib VF to 2 pvhvm guests.
>>   2). Start rds-stress between 2 guests.
>>   3). Scale down 2 guests vcpu from 32 to 6 at the same time.
>>
>> Repeat above test several iteration, guest kernel reported "No irq handler
>> for vector", and IB traffic downed to zero which caused by interrupt lost.
>>
>> When vcpu offline, kernel disabled local IRQ, migrate IRQ to other cpu,
>> update MSI-X table, enable IRQ. If any new interrupt arrived after
>> local IRQ disabled also before MSI-X table been updated, interrupt still 
>> used old vector and dest cpu info, and when local IRQ enabled again, 
>> interrupt been sent to wrong cpu and vector.
>>
>> Looks sync PIR to IRR after MSI-X been updated is help for this issue.
> 
> I'm having trouble making the connection, which quite possibly simply
> means the description needs to be further extended: Sync-ing PIR to
> IRR has nothing to do with a vector change. It would help if nothing
> else caused this bitmap propagation, and an interrupt was lost (rather
> than delivered through the wrong vector, or to the wrong CPU).
> Furthermore with vector and destination being coupled, after a CPU has
> been offlined this would generally mean
> - if there was just a single destination permitted, lack of delivery
>   altogether,
> - if there were multiple destinations permitted, delivery to one of
>   the other CPUs, at which point the vector would still be valid.

When cpu offline on guest kernel, it only migrates IRQs which affinity set
to the cpu only, if multiple destinations, kernel does not do migration
which included update msi-x table with new destination also vector.

After IRQ migration, kernel will check all vector's IRR, if APIC IRR
been set, retrigger the IRQ to new destination. This intend to avoid
to lost any interrupt.

But on Xen, after msi-x table updated, it never tried to check and notify
guest kernel there was pending IRQ.

> 
> An interesting aspect would be on which CPU the log message was
> observed, and how this correlates with the destination sets of the
> CPUs that have got offlined. From there it would then further be
> interesting to understand why the interrupt made it to that CPU,
> since - as said - destination and vector get changed together, and
> hence with things going wrong it would be of interest to know whether
> the CPU receiving the IRQ is within the new destination set, or some
> (random?) other one.

irq_retrigger() been called after kernel updated vector, irq_retrigger()
will send pending IRQ(s) to new destination.

Here are kernel log when issue happened, guest kernel is 4.1, on 4.14
guest, it's almost same, but no "(irq -1)" for kernel changes, IRQ
migrations workflow are same(fixup_irqs()):

Sep 12 20:26:46 localhost kernel: smpboot: CPU 17 is now offline
Sep 12 20:26:46 localhost kernel: smpboot: CPU 18 is now offline
Sep 12 20:26:46 localhost kernel: smpboot: CPU 19 is now offline
Sep 12 20:26:47 localhost kernel: Broke affinity for irq 251
Sep 12 20:26:47 localhost kernel: do_IRQ: 20.178 No irq handler for vector (irq 
-1)
Sep 12 20:26:47 localhost kernel: smpboot: CPU 20 is now offline
Sep 12 20:26:47 localhost kernel: smpboot: CPU 21 is now offline

From above, you can see IRQ sent to cpu 20, which is offlining.

IRQ arrived to the cpu immediately when IRQ enabled, after CPU offlined,
it prints log "smpboot: CPU 20 is now offline".

Call path in kernel as below:
cpu_down()
  |-> cpu_down_maps_locked()
  |     _cpu_down
  |       |-> __stop_machine
  |             |-> stop_cpus()
  |                   |->__stop_cpus()
  |                        |- queue_stop_cpus_work() ---+
  |-> __cpu_die()                                       |
         |-> pr_info("CPU %u is now offline\n", cpu);   |
     +--------------------------------------------------+
     |
     +
multi_cpu_stop()
  |-> local_save_flags
  |-> take_cpu_down()
  |      |-> __cpu_disable
  |            |-> smp_ops.cpu_disable = xen_cpu_disable
  |                  |-> cpu_disable_common
  |                        |-> fixup_irqs <== IRQ migration.
  |-> local_irq_restore()

> 
>> BTW, I could not reproduced this issue if I disabled apicv.
> 
> Which, I agree, is a fair hint of something APIC-V-specific to be
> amiss, but which (due to the vastly different timing) isn't a
> reliable indicator.

Seems apicv enabled/disabled, it on different code path.
> 
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -412,6 +412,9 @@ int pt_irq_create_bind(
>>                  pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
>>                  pirq_dpci->gmsi.gflags = gflags;
>>              }
>> +
>> +            if ( hvm_funcs.sync_pir_to_irr )
>> +                
>> hvm_funcs.sync_pir_to_irr(d->vcpu[pirq_dpci->gmsi.dest_vcpu_id]);
> 
> If the need for this change can be properly explained, then it
> still wants converting to alternative_vcall() - the the other
> caller of this hook. Or perhaps even better move vlapic.c's
> wrapper (suitably renamed) into hvm.h, and use it here.

Yes I agree, I'm not 100% sure, so I set it to RFC.

> 
> Additionally, the code setting pirq_dpci->gmsi.dest_vcpu_id
> (right after your code insertion) allows for the field to be
> invalid, which I think you need to guard against.

I think you means multiple destination, then it's -1?

> 
> Also, just to remind you: Please follow patch submission rules:
> They get sent _to_ the list, with maintainers etc _cc_-ed.

Got it, thanks,
Joe

> 
> Jan
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.