[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null domains after xl destroy

To: Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx
From: Glenn Enright <glenn@xxxxxxxxxxxxxxx>
Date: Tue, 11 Apr 2017 20:03:14 +1200
Delivery-date: Tue, 11 Apr 2017 08:03:44 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/04/17 17:59, Juergen Gross wrote:

On 11/04/17 07:25, Glenn Enright wrote:

Hi all

We are seeing an odd issue with domu domains from xl destroy, under
recent 4.9 kernels a (null) domain is left behind.


I guess this is the dom0 kernel version?

This has occurred on a variety of hardware, with no obvious commonality.

4.4.55 does not show this behavior.

On my test machine I have the following packages installed under
centos6, from https://xen.crc.id.au/

~]# rpm -qa | grep xen
xen47-licenses-4.7.2-4.el6.x86_64
xen47-4.7.2-4.el6.x86_64
kernel-xen-4.9.21-1.el6xen.x86_64
xen47-ocaml-4.7.2-4.el6.x86_64
xen47-libs-4.7.2-4.el6.x86_64
xen47-libcacard-4.7.2-4.el6.x86_64
xen47-hypervisor-4.7.2-4.el6.x86_64
xen47-runtime-4.7.2-4.el6.x86_64
kernel-xen-firmware-4.9.21-1.el6xen.x86_64

I've also replicated the issue with 4.9.17 and 4.9.20

To replicate, on a cleanly booted dom0 with one pv VM, I run the
following on the VM

{
while true; do
 dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
done
}

Then on the dom0 I do this sequence to reliably get a null domain. This
occurs with oxenstored and xenstored both.

{
xl sync 1
xl destroy 1
}

xl list then renders something like ...

(null)                                       1     4     4     --p--d
9.8     0


Something is referencing the domain, e.g. some of its memory pages are
still mapped by dom0.

From what I can see it appears to be disk related. Affected VMs all use
lvm storage for their boot disk. lvdisplay of the affected lv shows that
the lv has is being help open by something.


How are the disks configured? Especially the backend type is important.


~]# lvdisplay test/test.img | grep open
  # open                 1

I've not been able to determine what that thing is as yet. I tried lsof,
dmsetup, various lv tools. Waiting for the disk to be released does not
work.

~]# xl list
Name                                        ID   Mem VCPUs      State
Time(s)
Domain-0                                     0  1512     2     r-----
29.0
(null)                                       1     4     4     --p--d
9.8

xenstore-ls reports nothing for the null domain id that I can see.


Any qemu process related to the domain still running?

Any dom0 kernel messages related to Xen?


Juergen


Yep, 4.9 dom0 kernel

Typically we see an xl process running, but that has already gone awayin this case. The domU is a PV guest using phy definition, the basicstartup is like this...

xl -v create -f paramfile extra="console=hvc0 elevator=noopxen-blkfront.max=64"


There are no qemu processes or threads anywhere I can see.

I dont see any meaningful messages in the linux kernel log, and nothingat all in the hypervisor log. Here is output from the dom0 starting andthen stopping a domU using the above mechanism


br0: port 2(vif3.0) entered disabled state
br0: port 2(vif4.0) entered blocking state
br0: port 2(vif4.0) entered disabled state
device vif4.0 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not ready

xen-blkback: backend/vbd/4/51713: using 2 queues, protocol 1(x86_64-abi) persistent grantsxen-blkback: backend/vbd/4/51721: using 2 queues, protocol 1(x86_64-abi) persistent grants

vif vif-4-0 vif4.0: Guest Rx ready
IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
br0: port 2(vif4.0) entered blocking state
br0: port 2(vif4.0) entered forwarding state
br0: port 2(vif4.0) entered disabled state
br0: port 2(vif4.0) entered disabled state
device vif4.0 left promiscuous mode
br0: port 2(vif4.0) entered disabled state

... here is xl info ...

host                   : xxxxxxxxxxxx
release                : 4.9.21-1.el6xen.x86_64
version                : #1 SMP Sat Apr 8 18:03:45 AEST 2017
machine                : x86_64
nr_cpus                : 4
max_cpu_id             : 3
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2394

hw_caps :b7ebfbff:0000e3bd:20100800:00000001:00000000:00000000:00000000:00000000

virt_caps              :
total_memory           : 8190
free_memory            : 6577
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 7
xen_extra              : .2
xen_version            : 4.7.2
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :

xen_commandline : dom0_mem=1512M cpufreq=xen dom0_max_vcpus=2dom0_vcpus_pin log_lvl=all guest_loglvl=all vcpu_migration_delay=1000

cc_compiler            : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
cc_compile_by          : mockbuild
cc_compile_domain      : (none)
cc_compile_date        : Mon Apr  3 12:17:20 AEST 2017
build_id               : 0ec32d14d7c34e5d9deaaf6e3b7ea0c8006d68fa
xend_config_format     : 4


# cat /proc/cmdline

ro root=UUID=xxxxxxxxxx rd_MD_UUID=xxxxxxxxxxxx rd_NO_LUKSKEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=xxxxxxxxxxxxxSYSFONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quietpcie_aspm=off panic=30 max_loop=64 dm_mod.use_blk_mq=y xen-blkfront.max=64

The domu is using an lvm on top of a md raid1 array, on direct connectedHDDs. Nothing special hardware wise. The disk line for that domU looksfunctionally like...


disk = [ 'phy:/dev/testlv/test.img,xvda1,w' ]

I would appreciate any suggestions on how to increase the debug level ina relevant way or where to look to get more useful information on whatis happening.


To clarify the actual shutdown sequence that causes problems...

# xl sysrq $id s
# xl destroy $id


Regards, Glenn

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] null domains after xl destroy
  - From: Dietmar Hahn

References:
- [Xen-devel] null domains after xl destroy
  - From: Glenn Enright
- Re: [Xen-devel] null domains after xl destroy
  - From: Juergen Gross

Prev by Date: [Xen-devel] [PATCH] clang: disable the gcc-compat warnings for read_atomic
Next by Date: [Xen-devel] Enabling VT-d PI by default
Previous by thread: Re: [Xen-devel] null domains after xl destroy
Next by thread: Re: [Xen-devel] null domains after xl destroy
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.