[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] oxenstored memory leak? seems related with XSA-38



Hi all,

I test starting vm using xen-4.2.2 release with oxenstored, and got a problem 
may be related with XSA-38 
(http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).

When vm started, oxenstored memory usage keep increasing, and it took 1.5G 
memory at last. Vm hanged at loading OS screen.

Here is the output of top:

top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99, 0.63, 0.32
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  46919428 total, 46699012 used,   220416 free,    36916 buffers
KiB Swap:  2103292 total,        0 used,  2103292 free, 44260932 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND
  806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14 oxenstored

  
top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99, 0.67, 0.34
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  46919428 total, 46708564 used,   210864 free,    36964 buffers
KiB Swap:  2103292 total,        0 used,  2103292 free, 44168380 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND
  806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03 oxenstored

  

top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00, 0.80, 0.44
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  46919428 total, 46703052 used,   216376 free,    37208 buffers
KiB Swap:  2103292 total,        0 used,  2103292 free, 43682968 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND
  806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10 oxenstored

And oxenstored log got these over and over again:

[20130701T12:27:14.290Z] D8           invalid     device/suspend/event-channel  
                                                                                
                 ..
[20130701T12:27:14.290Z] D8.1937077039 invalid   /event-channel                 
                                                                                
                 ..
[20130701T12:27:14.290Z] D8.1852727656 invalid                                  
                                                                                
                 ..
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug
[20130701T12:27:14.290Z] D8           debug

My vm is a windows guest and has GPL PVDriver installed. This problem is hard 
to reproduce, and after a hard reboot, everything looks normal.

I guess it's something wrong with the xenbus IO Ring, so I investigated the 
code:

1) oxenstored and xenbus in vm using a shared page to communicate with each 
other
  struct xenstore_domain_interface {
    char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
    char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
    XENSTORE_RING_IDX req_cons, req_prod;
    XENSTORE_RING_IDX rsp_cons, rsp_prod;
};

2) xenbus in vm put request in req and increase req_prod, then send a event to 
oxenstored
3) oxenstored calculates how many to read using req_cons and req_prod, and 
after read oxenstored increase req_cons to make it equals req_prod which means 
no request pending.
4) oxenstored put responds in rsp and increase rsp_prod, then send a event to 
vm, xenbus in vm using similar logic to handle the rsp ring.

 Am I correct?

So, I'm curious about what happened when req_cons larger than req_prod (this 
can be caused by buggy PV Driver or malicious guest user), it seems oxenstored 
will fell in a endless loop.

Is this what XSA-38 talk about?

I built a pvdriver which will set req_prod to 0 after several xenstore 
operation, and test it on xen-unstable.hg make sure all XSA-38 patches applied. 
It seems that the problem I first met reproduced. Oxenstored will took a lot of 
memory eventually.

Could anyone help me about this issue?







_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.