[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recent upgrade of 4.13 -> 4.14 issue


  • To: Juergen Gross <JGross@xxxxxxxx>, "George.Dunlap@xxxxxxxxxx" <George.Dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Mon, 26 Oct 2020 16:31:01 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aB/GevxpENHG7YFJITbPgqvDU4lP4uZ9P5yFjCmyIns=; b=K+oM/aS0eBVfDYpbwBdUzVGeNmNXzPOdxYpi5hMvziF+01dl8q9zGveL3zTnSshsAZvDUBaqB2qMVe4NWeQeUFEWlZ0XMd3mbuiP8Vf/qrEKKxlPfzzZLqymvwcWjKnSOHHIidwwWpX1wLC6xTm8z4jitL9arI165XIrC69WjM6/ZuoclB7t6vD6PDiFiSv3SRuwPK4DXRJxOBjDJoMYgvSPjxP7u+ah2uAfwK+fii++UpuFri2MTKkmHzetzIlSHCdv6jYagEVzCXqcXeeUEmcwhFTWfHvflNNmzWNR5KzdJWDwZN38/AA529B/cnTseXiYKGCvxLL/PFHb7HiRlg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NQCfcUp4U/H9cPg9+Ij41KFMbNdMqdDj6vJIX0iNgdrUYcmvy2EA39i1qhdyjOH53Eh823JYaotV9d+zNttsYETZH52TISqe+YnzoRQM7dsMBd8VewfIP4IaBe+B6bQ3ZP+wUq+2rtNDeXbBll96YA2hlrqCfJ32SNNuZ9+0bffo6lLvr/qEylP5faL0Vq9D0idadJlatOBeFr/dEObTuCHKIe4Mll8I2o7Dsa33pzNQJ/biC7bp8jT2sQCAoTCfAw2Fw3m24ZMtJT21bFQziZyUBuR73Pj78sVSW9Ak3/1nASXE8frG2YDoKghRnb+r5ljLm9oPVbtxpwxmw11bPw==
  • Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=suse.com;
  • Cc: "marmarek@xxxxxxxxxxxxxxxxxxxxxx" <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, "frederic.pierret@xxxxxxxxxxxx" <frederic.pierret@xxxxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Mon, 26 Oct 2020 16:31:19 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWq6SD/uyafOnyXU2hV/eab0QUF6mqE3EA
  • Thread-topic: Recent upgrade of 4.13 -> 4.14 issue

On Mon, 2020-10-26 at 15:30 +0100, Jürgen Groß wrote:
> On 26.10.20 14:54, Andrew Cooper wrote:
> > On 26/10/2020 13:37, Frédéric Pierret wrote:
> > > 
> > > If anyone would have any idea of what's going on, that would be
> > > very
> > > appreciated. Thank you.
> > 
> > Does booting Xen with `sched=credit` make a difference?
> 
> Hmm, I think I have spotted a problem in credit2 which could explain
> the
> hang:
> 
> csched2_unit_wake() will NOT put the sched unit on a runqueue in case
> it
> has CSFLAG_scheduled set. This bit will be reset only in
> csched2_context_saved().
> 
Exactly, it does not put it back there. However, if it finds a vCPU
with the CSFLAG_scheduled flag set, It should set
CSFLAG_delayed_runq_add flag.

Unless curr_on_cpu(cpu)==unit or unit_on_runq(svc)==true... which
should not be the case. Or where you saying that we actually are in one
of this situations?

In fact...

> So in case a vcpu (and its unit, of course) is blocked and there has
> been no other vcpu active on its physical cpu but the idle vcpu,
> there
> will be no call of csched2_context_saved(). This will block the vcpu
> to become active in theory for eternity, in case there is no need to
> run another vcpu on the physical cpu.
> 
...I maybe am not seeing what exact situation and sequence of events
you're exactly thinking to. What I see is this: [*]

- vCPU V is running, i.e., CSFLAG_scheduled is set
- vCPU V blocks
- we enter schedule()
  - schedule calls do_schedule() --> csched2_schedule()
    - we pick idle, so CSFLAG_delayed_runq_add is set for V
  - schedule calls sched_context_switch()
    - sched_context_switch() calls context_switch()
      - context_switch() calls sched_context_switched()
        - sched_context_switched() calls:
          - vcpu_context_saved()
          - unit_context_saved()
            - unit_context_saved() calls sched_context_saved() -->    
                                          csched2_context_saved()
              - csched2_context_saved():
                - clears CSFLAG_scheduled
                - checks (and clear) CSFLAG_delayed_runq_add

[*] this assumes granularity 1, i.e., no core-scheduling and no 
    rendezvous. Or was core-scheduling actually enabled?

And if CSFLAG_delayed_runq_add is set **and** the vCPU is runnable, the
task is added back to the runqueue.

So, even if we don't do the actual context switch (i.e., we don't call
__context_switch() ) if the next vCPU that we pick when vCPU V blocks
is the idle one, it looks to me that we go get to call
csched2_context_saved().

And it also looks to me that, when we get to that, if the vCPU is
runnable, even if it has the CSFLAG_scheduled still set, we do put it
back to the runqueue.

And if the vCPU blocked, but csched2_unit_wake() run while
CSFLAG_scheduled was still set, it indeed should mean that the vCPU
itself will be runnable again when we get to csched2_context_saved().

Or did you have something completely different in mind, and I'm missing
it?


Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.