[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?
On Tue, Aug 11, 2015 at 06:21:18PM +0100, Andrew Cooper wrote: > On 11/08/15 18:07, Wei Liu wrote: > > On Tue, Aug 11, 2015 at 04:48:13PM +0100, Ian Campbell wrote: > >> On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote: > >>> It's the checkpoint file - i.e. the command line argument to xl > >>> restore - that is being leaked. > >> Thanks. > >> > >> [...] > >>> So the checkpoint file is clearly being leaked. > >> Indeed. I confirmed this even with the current development version using ls > >> -l /proc/<pid>/fd which shows an fd open on a deleted file: > >> > >> # ps aux| grep xl > >> root 20465 0.0 0.2 106036 984 ? SLsl 15:42 0:00 xl > >> restore save > >> # ls -l /proc/20465/fd > >> [...] > >> lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save > >> [...] > >> # rm /root/save > >> # ls -l /proc/20465/fd > >> [...] > >> lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted) > >> [...] > >> > >>> Its space is not freed > >>> until the 'xl restore' process is ended by shutting down the domain: > >> [...] > >>> It seems like xl restore should close the checkpoint file as soon as > >>> it's done restoring the domain, allowing the space to be freed, but > >>> that's clearly not happening. > >> Right. In fact xl sets the file to be close-on-exec right after opening it, > >> which is before the daemonisation step, so it ought to be closed > >> automatically, but isn't for some reason. > >> > >> My working theory is that something in the machinery which spawns the save > >> helper is defeating the use of CLOEXEC, perhaps by dup2() or perhaps by > >> unsetting CLOEXEC. > >> > >> Any way, thanks for reporting. I've copied the devel list and 4.6 RM. Wei > >> this probably ought to be a blocker for 4.6 (and the fix ought ultimately > >> to be backported to 4.4 onwards at least). > >> > >> NB: This leak seems to be independent of the switch to migration v2. > >> > >> Ian. > > Maybe this is just because we leak a fd. > > > > I don't see how CLOEXEC would be of any use if xl doesn't actually exec > > anything. > > > > Below is a PoC patch which seems to fix the problem for me. > > > > ---8<--- > > commit 7b5f466d5977dc9f41991ca0c2227023ac07709d > > Author: Wei Liu <wei.liu2@xxxxxxxxxx> > > Date: Tue Aug 11 18:02:25 2015 +0100 > > > > xl: close restore_fd when we finish with it > > > > Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx> > > > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > > index 499a05c..525cd24 100644 > > --- a/tools/libxl/xl_cmdimpl.c > > +++ b/tools/libxl/xl_cmdimpl.c > > @@ -2846,6 +2846,10 @@ start: > > ret = libxl_domain_create_new(ctx, &d_config, &domid, > > 0, autoconnect_console_how); > > } > > + > > + if (migrate_fd < 0) > > + close(restore_fd); > > + > > You surely need check for restore_fd >= 0, to avoid a potential EBADF ? > Indeed. When we create a new domain, restore_fd is -1. Wei. > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |