[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null domains after xl destroy



On 19/04/17 09:16, Roger Pau Monné wrote:
> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote:
>> On 19/04/17 03:02, Glenn Enright wrote:
>>> On 18/04/17 20:36, Juergen Gross wrote:
>>>> On 12/04/17 00:45, Glenn Enright wrote:
>>>>> On 12/04/17 10:23, Andrew Cooper wrote:
>>>>>> On 11/04/2017 23:13, Glenn Enright wrote:
>>>>>>> On 11/04/17 21:49, Dietmar Hahn wrote:
>>>>>>>> Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright:
>>>>>>>>> On 11/04/17 17:59, Juergen Gross wrote:
>>>>>>>>>> On 11/04/17 07:25, Glenn Enright wrote:
>>>>>>>>>>> Hi all
>>>>>>>>>>>
>>>>>>>>>>> We are seeing an odd issue with domu domains from xl destroy,
>>>>>>>>>>> under
>>>>>>>>>>> recent 4.9 kernels a (null) domain is left behind.
>>>>>>>>>>
>>>>>>>>>> I guess this is the dom0 kernel version?
>>>>>>>>>>
>>>>>>>>>>> This has occurred on a variety of hardware, with no obvious
>>>>>>>>>>> commonality.
>>>>>>>>>>>
>>>>>>>>>>> 4.4.55 does not show this behavior.
>>>>>>>>>>>
>>>>>>>>>>> On my test machine I have the following packages installed under
>>>>>>>>>>> centos6, from https://xen.crc.id.au/
>>>>>>>>>>>
>>>>>>>>>>> ~]# rpm -qa | grep xen
>>>>>>>>>>> xen47-licenses-4.7.2-4.el6.x86_64
>>>>>>>>>>> xen47-4.7.2-4.el6.x86_64
>>>>>>>>>>> kernel-xen-4.9.21-1.el6xen.x86_64
>>>>>>>>>>> xen47-ocaml-4.7.2-4.el6.x86_64
>>>>>>>>>>> xen47-libs-4.7.2-4.el6.x86_64
>>>>>>>>>>> xen47-libcacard-4.7.2-4.el6.x86_64
>>>>>>>>>>> xen47-hypervisor-4.7.2-4.el6.x86_64
>>>>>>>>>>> xen47-runtime-4.7.2-4.el6.x86_64
>>>>>>>>>>> kernel-xen-firmware-4.9.21-1.el6xen.x86_64
>>>>>>>>>>>
>>>>>>>>>>> I've also replicated the issue with 4.9.17 and 4.9.20
>>>>>>>>>>>
>>>>>>>>>>> To replicate, on a cleanly booted dom0 with one pv VM, I run the
>>>>>>>>>>> following on the VM
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> while true; do
>>>>>>>>>>>  dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
>>>>>>>>>>> done
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Then on the dom0 I do this sequence to reliably get a null domain.
>>>>>>>>>>> This
>>>>>>>>>>> occurs with oxenstored and xenstored both.
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> xl sync 1
>>>>>>>>>>> xl destroy 1
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> xl list then renders something like ...
>>>>>>>>>>>
>>>>>>>>>>> (null)                                       1     4     4
>>>>>>>>>>> --p--d
>>>>>>>>>>> 9.8     0
>>>>>>>>>>
>>>>>>>>>> Something is referencing the domain, e.g. some of its memory pages
>>>>>>>>>> are
>>>>>>>>>> still mapped by dom0.
>>>>>>>>
>>>>>>>> You can try
>>>>>>>> # xl debug-keys q
>>>>>>>> and further
>>>>>>>> # xl dmesg
>>>>>>>> to see the output of the previous command. The 'q' dumps domain
>>>>>>>> (and guest debug) info.
>>>>>>>> # xl debug-keys h
>>>>>>>> prints all possible parameters for more informations.
>>>>>>>>
>>>>>>>> Dietmar.
>>>>>>>>
>>>>>>>
>>>>>>> I've done this as requested, below is the output.
>>>>>>>
>>>>>>> <snip>
>>>>>>> (XEN) Memory pages belonging to domain 1:
>>>>>>> (XEN)     DomPage 0000000000071c00: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c01: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c02: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c03: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c04: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c05: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c06: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c07: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c08: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c09: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c0a: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c0b: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c0c: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c0d: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c0e: caf=00000001, taf=7400000000000001
>>>>>>> (XEN)     DomPage 0000000000071c0f: caf=00000001, taf=7400000000000001
>>>>>>
>>>>>> There are 16 pages still referenced from somewhere.
>>>>
>>>> Just a wild guess: could you please try the attached kernel patch? This
>>>> might give us some more diagnostic data...
>>>>
>>>>
>>>> Juergen
>>>>
>>>
>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still
>>> shows the issue. When replicating the leak I now see this trace (via
>>> dmesg). Hopefully that is useful.
>>>
>>> Please note, I'm going to be offline next week, but am keen to keep on
>>> with this, it may just be a while before I followup is all.
>>>
>>> Regards, Glenn
>>> http://rimuhosting.com
>>>
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508
>>> xen_blkbk_remove+0x138/0x140
>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev
>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4
>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security
>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc
>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus
>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy
>>> dm_mirror dm_region_hash dm_log dm_mod
>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1
>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
>>>  ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000
>>>  0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d
>>>  ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0
>>> Call Trace:
>>>  [<ffffffff8136b61f>] dump_stack+0x67/0x98
>>>  [<ffffffff8108007d>] __warn+0xfd/0x120
>>>  [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
>>>  [<ffffffff814ebde8>] xen_blkbk_remove+0x138/0x140
>>>  [<ffffffff814497f7>] xenbus_dev_remove+0x47/0xa0
>>>  [<ffffffff814bcfd4>] __device_release_driver+0xb4/0x160
>>>  [<ffffffff814bd0ad>] device_release_driver+0x2d/0x40
>>>  [<ffffffff814bbfd4>] bus_remove_device+0x124/0x190
>>>  [<ffffffff814b93a2>] device_del+0x112/0x210
>>>  [<ffffffff81448113>] ? xenbus_read+0x53/0x70
>>>  [<ffffffff814b94c2>] device_unregister+0x22/0x60
>>>  [<ffffffff814ed7cd>] frontend_changed+0xad/0x4c0
>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>  [<ffffffff81449b57>] xenbus_otherend_changed+0xc7/0x140
>>>  [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>  [<ffffffff81449fe0>] frontend_changed+0x10/0x20
>>>  [<ffffffff814477fc>] xenwatch_thread+0x9c/0x140
>>>  [<ffffffff810bffa0>] ? woken_wake_function+0x20/0x20
>>>  [<ffffffff816ed93a>] ? schedule+0x3a/0xa0
>>>  [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>>  [<ffffffff810c0c5d>] ? complete+0x4d/0x60
>>>  [<ffffffff81447760>] ? split+0xf0/0xf0
>>>  [<ffffffff810a051d>] kthread+0xcd/0xf0
>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>>  [<ffffffff816f1b45>] ret_from_fork+0x25/0x30
>>> ---[ end trace ee097287c9865a62 ]---
>>
>> Konrad, Roger,
>>
>> this was triggered by a debug patch in xen_blkbk_remove():
>>
>>      if (be->blkif)
>> -            xen_blkif_disconnect(be->blkif);
>> +            WARN_ON(xen_blkif_disconnect(be->blkif));
>>
>> So I guess we need something like xen_blk_drain_io() in case of calls to
>> xen_blkif_disconnect() which are not allowed to fail (either at the call
>> sites of xen_blkif_disconnect() or in this function depending on a new
>> boolean parameter indicating it should wait for outstanding I/Os).
>>
>> I can try a patch, but I'd appreciate if you could confirm this wouldn't
>> add further problems...
> 
> Hello,
> 
> Thanks for debugging this, the easiest solution seems to be to replace the
> ring->inflight atomic_read check in xen_blkif_disconnect with a call to
> xen_blk_drain_io instead, and making xen_blkif_disconnect return void (to
> prevent further issues like this one).

Nah, this isn't going to work. Or at least it won't work as it was
designed to. :-)

The main problem seems to be that xen_blkif_get/put() are used for
multiple purposes: they shouldn't be used by xen_blkif_alloc_rings() and
xen_blkif_disconnect() as they will prohibit xen_blkif_deferred_free()
being called when an I/O is terminated and xen_blkif_put() is meant to
free all remaining resources.

I'll write a patch to correct this.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.