[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ANNOUNCE] Xen 4.15 - call for notification/status of significant bugs



On Thu, Feb 4, 2021 at 9:21 AM Dario Faggioli <dfaggioli@xxxxxxxx> wrote:
>
> On Thu, 2021-02-04 at 12:12 +0000, Ian Jackson wrote:
> > B. "scheduler broken" bugs.
> >
> > Information from
> >   Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >   Dario Faggioli <dfaggioli@xxxxxxxx>
> >
> > Quoting Andrew Cooper
> > > We've had 4 or 5 reports of Xen not working, and very little
> > > investigation on whats going on.  Suspicion is that there might be
> > > two bugs, one with smt=0 on recent AMD hardware, and one more
> > > general "some workloads cause negative credit" and might or might
> > > not be specific to credit2 (debugging feedback differs - also might
> > > be 3 underlying issue).
> >
> > I reviewed a thread about this and it is not clear to me where we are
> > with this.
> >
> Ok, let me try to summarize the current status.
>
> - BUG: credit=sched2 machine hang when using DRAKVUF
>
>   https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html
>   https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
>   https://bugzilla.opensuse.org/show_bug.cgi?id=1179246
>
>   99% sure that it's a Credit2 scheduler issue.
>   I'm actively working on it.
>   "Seems a tricky one; I'm still in the analysis phase"
>
>   Manifests only with certain combination of hardware and workload.
>   I'm not reproducing, but there are multiple reports of it (see
>   above). I'm investigating and trying to come up at least with
>   debug patches that one of the reporter should be able and willing to
>   test.
>
> - Null scheduler and vwfi native problem
>
>   https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01634.html
>
>   RCU issues, but manifests due to scheduler behavior (especially
>   NULL scheduler, especially on ARM).
>   I'm actively working on it.
>
>   Patches that should solve the issue for ARM posted already. They
>   will need to be slightly adjusted to cover x86 as well. Waiting a
>   couple days more for a confirmation from the reporter that the
>   patches do help, at least on ARM.
>

I've run into null-scheduler causing CPU lockups as well on x86.
Required physical machine reboot. Seems to be triggered with domain
destruction when destroying fork vms. Happens only intermittently.

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.