[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Null scheduler and vwfi native problem



On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:
> On 1/25/21 5:11 PM, Dario Faggioli wrote:
> > On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
> > > Hi Anders,
> > > 
> > > On 22/01/2021 08:06, Anders Törnqvist wrote:
> > > > On 1/22/21 12:35 AM, Dario Faggioli wrote:
> > > > > On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
> > > > - booting with "sched=null vwfi=native" but not doing the IRQ
> > > > passthrough that you mentioned above
> > > > "xl destroy" gives
> > > > (XEN) End of domain_destroy function
> > > > 
> > > > Then a "xl create" says nothing but the domain has not started
> > > > correct.
> > > > "xl list" look like this for the domain:
> > > > mydomu                                   2   512     1 ------
> > > > 0.0
> > > This is odd. I would have expected ``xl create`` to fail if
> > > something
> > > went wrong with the domain creation.
> > > 
> > So, Anders, would it be possible to issue a:
> > 
> > # xl debug-keys r
> > # xl dmesg
> > 
> > And send it to us ?
> > 
> > Ideally, you'd do it:
> >   - with Julien's patch (the one he sent the other day, and that
> > you
> >     have already given a try to) applied
> >   - while you are in the state above, i.e., after having tried to
> >     destroy a domain and failing
> >   - and maybe again after having tried to start a new domain
> Here are some logs.
> 
Great, thanks a lot!

> The system is booted as before with the patch and the domu config
> does 
> not have the IRQs.
> 
Ok.

> # xl list
> Name                                        ID   Mem VCPUs State   
> Time(s)
> Domain-0                                     0  3000     5 r-----    
> 820.1
> mydomu                                       1   511     1 r-----    
> 157.0
> 
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=191793008000
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)       1: [0.0] pcpu=0
> (XEN)       2: [0.1] pcpu=1
> (XEN)       3: [0.2] pcpu=2
> (XEN)       4: [0.3] pcpu=3
> (XEN)       5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)       6: [1.0] pcpu=5
> (XEN) Waitqueue:
>
So far, so good. All vCPUs are running on their assigned pCPU, and
there is no vCPU wanting to run but not having a vCPU where to do so.

> (XEN) Command line: console=dtuart dtuart=/serial@5a060000 
> dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin 
> sched=null vwfi=native
>
Oh, just as a side note (and most likely unrelated to the problem we're
discussing), you should be able to get rid of dom0_vcpus_pin.

The NULL scheduler will do something similar to what that option itself
does anyway. And with the benefit that, if you want, you can actually
change to what pCPUs the dom0's vCPU are pinned. While, if you use
dom0_vcpus_pin, you can't.

So it using it has only downsides (and that's true in general, if you
ask me, but particularly so if using NULL).

> # xl destroy mydomu
> (XEN) End of domain_destroy function
> 
> # xl list
> Name                                        ID   Mem VCPUs State   
> Time(s)
> Domain-0                                     0  3000     5 r-----   
> 1057.9
> 
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=223871439875
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)       1: [0.0] pcpu=0
> (XEN)       2: [0.1] pcpu=1
> (XEN)       3: [0.2] pcpu=2
> (XEN)       4: [0.3] pcpu=3
> (XEN)       5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)       6: [1.0] pcpu=5
>
Right. And from the fact that: 1) we only see the "End of
domain_destroy function" line in the logs, and 2) we see that the vCPU
is still listed here, we have our confirmation (like there wase the
need for it :-/) that domain destruction is done only partially.

> # xl create mydomu.cfg
> Parsing config from mydomu.cfg
> (XEN) Power on resource 215
> 
> # xl list
> Name                                        ID   Mem VCPUs State   
> Time(s)
> Domain-0                                     0  3000     5 r-----   
> 1152.1
> mydomu                                       2   512     1 ------
>        0.0
> 
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=241210530250
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)       1: [0.0] pcpu=0
> (XEN)       2: [0.1] pcpu=1
> (XEN)       3: [0.2] pcpu=2
> (XEN)       4: [0.3] pcpu=3
> (XEN)       5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)       6: [1.0] pcpu=5
> (XEN)     Domain: 2
> (XEN)       7: [2.0] pcpu=-1
> (XEN) Waitqueue: d2v0
>
Yep, so, as we were suspecting, domain 1 was not destroyed properly.
Specifically, we did not get to the point where the vCPU is deallocated
and the pCPU to which such vCPU has been assigned to by the NULL
scheduler is released.

This means that the new vCPU (i.e., d2v0) has, from the point of view
of the NULL scheduler, no pCPU where to run. And it's therefore parked
in the waitqueue.

There should be a warning about that, which I don't see... but perhaps
I'm just misremembering.

Anyway, cool, this makes things even more clear.

Thanks again for letting us see these logs.
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.