[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Linux pvh vm not getting destroyed on shutdown

On Sun, Feb 14, 2021 at 11:45:47PM +0100, Maximilian Engelhardt wrote:
> On Samstag, 13. Februar 2021 19:21:56 CET Elliott Mitchell wrote:
> > On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:
> > > * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm,
> > > Debian kernel 5.7.17-1 does not show the issue.
> > 
> > I think the first kernel update during which I saw the issue was around
> > linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64.  I think
> > the last security update to the Xen packages was in a similar timeframe
> > though.  Rate this portion as unreliable though.  I can definitely state
> > this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built
> > from corresponding source, this may have shown earlier.
> We don't see any issues with the current Debian buster (Debian stable) kernel 
> (4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux) and 
> also did not notice any issues with the older kernel packages in buster. Also 
> the security update of xen in buster did not cause any behavior change for 
> us. 
> In our case everything in buster is working as we expect it to work (using 
> latest updates and security updates).

I can't really say much here.  I keep up to date and I cannot point to a
key ingredient as the one which caused this breakage.

> > Fresh observation.  During a similar timeframe I started noticing VM
> > creation leaving a `xl create` process behind.  I had discovered this
> > process could be freely killed without appearing to effect the VM and had
> > thus been doing so (memory in a lean Dom0 is precious).
> > 
> > While typing this I realized there was another scenario I needed to try.
> > Turns out if I boot PV GRUB and get to its command-line (press 'c'), then
> > get away from the VM console, kill the `xl create` process, return to
> > the console and type "halt".  This results in a hung VM.
> > 
> > Are you perhaps either killing the `xl create` process for effected VMs,
> > or migrating the VM and thus splitting the `xl create` process from the
> > effected VMs?
> > 
> > This seems more a Debian issue than a Xen Project issue right now.
> We don't migrate the vms, we don't kill any processes running on the dom0 and 
> I don't see anything in our logs indicating something gets killed on the 
> dom0. 
> On our systems the running 'xl create' processes only use very little memory.
> Have you tried if you still observer your hangs if you don't kill the xl 
> processes?

That is exactly what I pointed to above.  On stable killing the
mysterious left behind `xl create` process causes the problem to
manifest, while leaving it undisturbed appears to makes the problem not

After a save/restore instead it is a `xl restore` process left behind.
I /suspect/ this plays a similar role, I'm unsure how far this goes
though.  Might you try telling a VM to reboot, then do a save followed
by a restore of it?

I'm curious whether respawning the `xl restore` could work around what is

(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.