[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] PV and HVM domains left as zombies with grants [was: Re: AW: Payed Xen Admin]



On Tue, 2016-11-29 at 13:34 +0000, IP-Projects - Support wrote:
> Hello,
> 
> we see this i think when the vms are stopped or restarted by
> customers (xl destroy vm and then recreating) or I can reprdoce this
> when I stop them all by
> script with a for loop with xl destroy $i .
> 
Ok, that makes sense. What is happening to you is that some of the
domain, although dead, are still around as 'zombies', because they've
got outstanding pages/references/etc.

This is clearly visible in the output of the debug keys you provided.

Something similar has been discussed, e.g., here:
https://lists.xenproject.org/archives/html/xen-devel/2013-11/msg03413.h
tml

> It happens with hvm and pvm
> 
> testcase all vms started:
> 
> root@v34:/var# xl list
> Name                                        ID   Mem
> VCPUs      State   Time(s)
> Domain-0                                     0  2048     2     r---
> --     398.0
> vmanager2593                                34   512     1     -b--
> --       1.8
> 
> root@v34:/var# /root/scripts/vps_stop.sh
> root@v34:/var# xl list
> Name                                        ID   Mem
> VCPUs      State   Time(s)
> Domain-0                                     0  2048     2     r--
> ---     420.5
> (null)                                      34     0     1     --p
> --d       2.3
> 
Just for the sake of completeness, can we see what's in vps_stop.sh?

> root@v34:/var/log/xen# cat xl-vmanager2593.log
> Waiting for domain vmanager2593 (domid 34) to die [pid 23747]
> Domain 34 has been destroyed.
> 
Ok, thanks. Not much indeed. One way to increase the amount of
information would be to start the domains with:

xl -vvv create /etc/xen/vmanager2593.cfg

This will add logs coming from xl and libxl, which may not be where the
problem really is, but I think it's worth a try. Be aware that this
will make your terminal/console/whatever very busy, if you start a lot
of VMs at the same time.

From the config you posted (and that I removed) I see it's a PV guest,
so I'm not asking for any device model logs, in this case.

> /var/log/xen/xen-hotplug.log does not log anything. Any hint why?
> 
I've no idea, but I'm not even sure what kind of log that contains (I
guess stuff related to hotplug scripts).

So, here we are:

> (XEN) 'q' pressed -> dumping domain info (now=0x16B:4C7A5CC3)
> (XEN) General information for domain 34:
> (XEN)     refcnt=1 dying=2 pause_count=2
> (XEN)     nr_pages=122 xenheap_pages=0 shared_pages=0 paged_pages=0
> dirty_cpus={} max_pages=131328
>
As you see, there are outstanding pages. That's what is keeping the
domain around.

> (XEN)     handle=2a991534-312f-465a-9dff-f9a9fb1baadd
> vm_assist=0000002d
> (XEN) Rangesets belonging to domain 34:
> (XEN)     I/O Ports  { }
> (XEN)     log-dirty  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 34:
> (XEN)     DomPage 00000000005b9041: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9042: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9043: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9044: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9045: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9046: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9047: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9048: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9049: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b904a: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b904b: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b904c: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b904d: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b904e: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b904f: caf=00000001,
> taf=7400000000000001
> (XEN)     DomPage 00000000005b9050: caf=00000001,
> taf=7400000000000001
> (XEN) NODE affinity for domain 34: [0]
> (XEN) VCPU information and callbacks for domain 34:
> (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend=00 upcall_mask=01
> dirty_cpus={}
> (XEN)     cpu_hard_affinity={4-7} cpu_soft_affinity={0-7}
> (XEN)     pause_count=0 pause_flags=0
> (XEN)     No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 5)
> (XEN) Notifying guest 0:1 (virq 1, port 12)
> (XEN) Notifying guest 34:0 (virq 1, port 0)
> (XEN) Shared frames 0 -- Saved frames 0
> 
> (XEN) gnttab_usage_print_all [ key 'g' pressed
> (XEN)       -------- active --------       -------- shared --------
> (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
> (XEN) grant-table for remote domain:   34 (v1)
> (XEN) [  8]        0 0x5b8f05 0x00000001          0 0x5b8f05 0x19
> (XEN) [770]        0 0x5b90ba 0x00000001          0 0x5b90ba 0x19
> (XEN) [802]        0 0x5b90b9 0x00000001          0 0x5b90b9 0x19
> (XEN) [803]        0 0x5b90b8 0x00000001          0 0x5b90b8 0x19
> [snip]
>
And here they are the grants!

I'm Cc-ing someone who knows more than me about grants... In the
meanwhile, can you state again what it is exactly that you are using,
such as:
 - what Xen version?
 - what Dom0 kernel version?
 - about DomU kernel version, I see from this in the config file:
   vmlinuz-4.8.10-xen, so it's Linux 4.8.0, is that right?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.