[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list



On Tue, 2016-05-24 at 10:07 +0000, Wu, Feng wrote:
> > See, for instance, cpu_disable_scheduler() in schedule.c. What we
> > do is
> > go over all the vcpus of all domains of either the system or the
> > cpupool, and force the ones that we found with v->processor set to
> > the
> > pCPU that is going down, to perform migration (system_state will be
> > different than SYS_STATE_suspend, and we hence take the 'else'
> > branch).
> > 
> > Considering that the pCPU is no longer part of the relevant
> > bitmask-s
> > during the migration, the vCPUs will figure out that they just
> > can't
> > stay there, and move somewhere else.
>
> Thanks a lot for the elaboration, it is really helpful.
> 
NP :-)

> > Note, however, that this happens for running and runnable vCPUs. 
>
> I don't quite understand this, do you mean cpu_disable_scheduler()
> only handle running and runnable vCPUs, I tried to find some hints
> from the code, but I didn't get it. Could you please give some more
> information about this?
> 
It goes through all the vcpus of all domains, and does not check or
care whether they are running, runnable or blocked.

Let's look at this in some more details. So, let's assume that
processor 5 is going away, and that you have the following vcpus
around:

 d0v0 : v->processor = 5, running on cpu 5
 d0v1 : v->processor = 4, running on cpu 4
 d1v0 : v->processor = 5, runnable but not running
 d2v3 : v->processor = 5, blocked

for d0v0, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d0v0->pause_flags);
    vcpu_sleep_nosync(d0v0);
      SCHED_OP(sleep, d0v0);
        csched_vcpu_sleep(d0v0)
          cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
    vcpu_migrate(d0v0);
      if ( v->is_running || ...) // assume v->is_running is true
        return
    ...
    ... <--- scheduling occurs on processor 5
    ...
    context_saved(d0v0)
      vcpu_migrate(d0v0);
          //is_running is 0, so _VPF_migrating gets cleared
        vcpu_move_locked(d0v0, new_cpu);
        vcpu_wake(d0v0);
          SCHED_OP(wake, d0v0);
            csched_vcpu_wake(d0v0)
              __runq_insert(d0v0);
              __runq_tickle(d0v0);

for d0v1, we do:
  cpu_disable_scheduler(5)
    if ( d0v1->processor != 5 )
      continue

for d1v0, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d1v0->pause_flags);
    vcpu_sleep_nosync(d1v0);
      SCHED_OP(sleep, d1v0);
        csched_vcpu_sleep(d1v0)
          __runq_remove(d1v0);
    vcpu_migrate(d1v0);
      if ( d1v0->is_running ||
           !test_and_clear_bit(_VPF_migrating, d1v0->pause_flags)
          // false, but clears the _VPF_migrating flag
      vcpu_move_locked(d1v0, new_cpu);
      vcpu_wake(v);
        SCHED_OP(wake, d1v0);
          csched_vcpu_wake(d1v0)
            __runq_insert(d1v0);
            __runq_tickle(d1v0);

for d2v3, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d2v3-
>pause_flags);
    vcpu_sleep_nosync(d2v3);
      SCHED_OP(sleep, d2v3);
 
      csched_vcpu_sleep(d2v3)
[1]       // Nothing! 
   
vcpu_migrate(d2v3);
      if ( d2v3->is_running ||
         
 !test_and_clear_bit(_VPF_migrating, d2v3->pause_flags)
          //
false, but clears the _VPF_migrating flag
[*]   vcpu_move_locked(d2v3,
new_cpu);
      vcpu_wake(d2v3);
[2]     // Nothing!

> > If a
> > vCPU is blocker, there is nothing to do, and in fact, nothing
> > happens
> > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case). 
>
> What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> are NOP '?
> 
See [1] and [2] above.

> > For
> > those vCPUs, as soon as they wake up, they'll figure out that their
> > own
> > v->processor is not there any longer, and will move somewhere else.
>
> So basically, when vCPU is blocking, it has no impact to the blocking
> vcpu
> when 'v->processor' is removed. When the vCPU is waken up, it will
> find
> another pCPU to run, since the original 'v->processor' is down and no
> longer in the cpu bitmask, right?
> 
Yes, that was my point.

_However_, as you can see at [*] above, it must be noted that even
those vcpus that blocked while running on a certain processor (5 in the
example), indeed have a chance to have their
v->processor changed to something that is still online (something
different than 5), as a consequence of that processor going away.

Whether this is useful/enough for you, I can't tell right now, out of
the top of my head.

> > > > But this is not an issue  in non pCPU hotplug scenario.
> > > > 
> > It's probably an issue even if you remove a cpu from a cpupool (and
> > even a more "interesting" one, if you also manage to add it to
> > another
> > pool, in the meantime) isn't it?
>
> Yes, things become more complex in that case ....
> 
Well, but can you confirm that we also have an issue there, and test
and report what happens if you move a cpu from pool A to pool B, while
it still has vcpus from a domain that stays in pool A.

If there's transient suboptimal behavior, well, we probably can live
with that (depending on the specific characteristics of the transitory,
I'd say). If things crash, we certainly want a fix!

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.