Xen project Mailing List

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 10 Sep 2015 02:27:43 -0600

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Thu, 10 Sep 2015 08:27:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 10.09.15 at 04:07, <feng.wu@xxxxxxxxx> wrote: >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> Sent: Wednesday, September 09, 2015 6:27 PM >> >>> On 09.09.15 at 10:56, <feng.wu@xxxxxxxxx> wrote: >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> >> Sent: Monday, September 07, 2015 8:55 PM >> >> >>> On 25.08.15 at 03:57, <feng.wu@xxxxxxxxx> wrote: >> >> > + >> >> > + /* >> >> > + * Delete the vCPU from the related block list >> >> > + * if we are resuming from blocked state. >> >> > + */ >> >> > + if ( v->arch.hvm_vmx.pi_block_cpu != -1 ) >> >> > + { >> >> > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, >> >> > + v->arch.hvm_vmx.pi_block_cpu), >> >> flags); >> >> > + list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list); >> >> >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing >> >> the vCPU from the list? Which then ought to allow using just >> >> list_del() here? >> > >> > Here is the story about using list_del_init() instead of list_del(), (the >> > same reason for using list_del_init() in pi_wakeup_interrupt() ). >> > If we use list_del(), that means we need to set .pi_block_cpu to >> > -1 after removing the vCPU from the block list, there are two >> > places where the vCPU is removed from the list: >> > - arch_vcpu_wake_prepare() >> > - pi_wakeup_interrupt() >> > >> > That means we also need to set .pi_block_cpu to -1 in >> > pi_wakeup_interrupt(), which seems problematic to me, since >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe >> > to change it in pi_wakeup_interrupt(), maybe someone is using >> > it at the same time. >> > >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here >> > is only used when .pi_block_cpu is set to -1 at the initial time. >> > >> > If you have any good suggestions about this, I will be all ears. >> >> Latching pi_block_cpu into a local variable would take care of that >> part of the problem. Of course you then may also need to check >> that while waiting to acquire the lock pi_block_cpu didn't change. > > Thanks for the suggestion! But I think it is not easy to "check > "that while waiting to acquire the lock pi_block_cpu didn't change". > Let's take the following pseudo code as an example: > > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > > ...... > > spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > local_pi_block_cpu), flags); > list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > local_pi_block_cpu), flags); > > If .pi_block_cpu is changed to -1 by others during calling list_del() > above, that means the vCPU is removed by others, then calling > list_del() here again would be problematic. I think there might be > other corner cases for this. So I suggest adding some comments > for list_del_init() as you mentioned below. That's why I said "check that while waiting to acquire the lock pi_block_cpu didn't change." The code you present does not do this. >> >> > + do { >> >> > + old.control = new.control = pi_desc->control; >> >> > + >> >> > + /* Should not block the vCPU if an interrupt was posted for >> it. >> >> */ >> >> > + if ( pi_test_on(&old) ) >> >> > + { >> >> > + spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, >> >> gflags); >> >> > + vcpu_unblock(v); >> >> >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And >> >> is this safe? >> > >> > I cannot see anything unsafe so far, can some scheduler maintainer >> > help to confirm it? Dario? George? >> >> But first and foremost you ought to answer the "why". > > I think if you think this solution is unsafe, you need point out where it > is, not > just ask "is this safe ?", I don't think this is an effective comments. It is my general understanding that people wanting code to be included have to prove their code is okay, not reviewers to prove the code is broken. > Anyway, I go through the main path of the code as below, I really don't find > anything unsafe here. > > context_switch() --> > pi_ctxt_switch_from() --> > vmx_pre_ctx_switch_pi() --> > vcpu_unblock() --> > vcpu_wake() --> > SCHED_OP(VCPU2OP(v), wake, v) --> > csched_vcpu_wake() --> > __runq_insert() > __runq_tickle() > > If you continue to think it is unsafe, pointing out the place will be > welcome! Once again - I didn't say it's unsafe (and hence can't point at a particular place). What bothers me is that vcpu_unblock() affects scheduler state, and altering scheduler state from inside the context switch path looks problematic at the first glance. I'd be happy if I was told there is no such problem, but be aware that this would have to include latent ones: Even if right now things are safe, having such feedback have the potential of becoming an actual problem with a later scheduler change is already an issue, as finding the problem is then likely going to be rather hard (and I suspect it's not going to be you to debug this). >> >> > +static void vmx_post_ctx_switch_pi(struct vcpu *v) >> >> > +{ >> >> > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; >> >> > + >> >> > + if ( !has_arch_pdevs(v->domain) ) >> >> > + return; >> >> > + >> >> > + if ( x2apic_enabled ) >> >> > + write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor)); >> >> > + else >> >> > + write_atomic(&pi_desc->ndst, >> >> > + MASK_INSR(cpu_physical_id(v->processor), >> >> > + PI_xAPIC_NDST_MASK)); >> >> > + >> >> > + pi_clear_sn(pi_desc); >> >> > +} >> >> >> >> So you alter where notifications go, but not via which vector? How >> >> is the vCPU going to get removed from the blocked list then? >> > >> > vmx_post_ctx_switch_pi() is called when the vCPU is about to run, >> > in that case, the vector has been set properly and it has been removed >> > from the block list if it was blocked before. >> >> So why do you two updates then (first [elsewhere] to set the vector >> and then here to set the destination)? > > When the vCPU is unblocked and then removed from the blocking list, we > need to change the vector to the normal notification vector, since the > wakeup vector is only used when the vCPU is blocked and hence in the > blocked list. And here is the place we can decide which pCPU the vCPU > will be scheduled on, so we change the destination field here. Right, that's what I understood. But isn't the state things are in between these two updates inconsistent? I.e. - where does the notification go if one arrives in this window? Particularly - if it went to the right place, why would this second update be needed at all? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.