[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?
On Wed, 2015-08-12 at 10:49 +0100, Wei Liu wrote: > On Wed, Aug 12, 2015 at 09:41:13AM +0100, Ian Campbell wrote: > > On Tue, 2015-08-11 at 18:07 +0100, Wei Liu wrote: > > > On Tue, Aug 11, 2015 at 04:48:13PM +0100, Ian Campbell wrote: > > > > On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote: > > > > > It's the checkpoint file - i.e. the command line argument to xl > > > > > restore - that is being leaked. > > > > > > > > Thanks. > > > > > > > > [...] > > > > > So the checkpoint file is clearly being leaked. > > > > > > > > Indeed. I confirmed this even with the current development version > > > > using ls > > > > -l /proc/<pid>/fd which shows an fd open on a deleted file: > > > > > > > > # ps aux| grep xl > > > > root 20465 0.0 0.2 106036 984 ? SLsl 15:42 0:00 xl > > > > > > > > restore save > > > > # ls -l /proc/20465/fd > > > > [...] > > > > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save > > > > [...] > > > > # rm /root/save > > > > # ls -l /proc/20465/fd > > > > [...] > > > > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted) > > > > [...] > > > > > > > > > Its space is not freed > > > > > until the 'xl restore' process is ended by shutting down the > > > > > domain: > > > > [...] > > > > > > > > > > It seems like xl restore should close the checkpoint file as soon > > > > > as > > > > > it's done restoring the domain, allowing the space to be freed, > > > > > but > > > > > that's clearly not happening. > > > > > > > > Right. In fact xl sets the file to be close-on-exec right after > > > > opening > > > > it, > > > > which is before the daemonisation step, so it ought to be closed > > > > automatically, but isn't for some reason. > > > > > > > > My working theory is that something in the machinery which spawns > > > > the > > > > save > > > > helper is defeating the use of CLOEXEC, perhaps by dup2() or > > > > perhaps by > > > > unsetting CLOEXEC. > > > > > > > > Any way, thanks for reporting. I've copied the devel list and 4.6 > > > > RM. > > > > Wei > > > > this probably ought to be a blocker for 4.6 (and the fix ought > > > > ultimately > > > > to be backported to 4.4 onwards at least). > > > > > > > > NB: This leak seems to be independent of the switch to migration > > > > v2. > > > > > > > > Ian. > > > > > > Maybe this is just because we leak a fd. > > > > > > I don't see how CLOEXEC would be of any use if xl doesn't actually > > > exec > > > anything. > > > > Duh, for some reason I thought daemonize would activate the CLOEXEC, > > but > > it's just fork without exec. Silly me. > > > > > > > > Below is a PoC patch which seems to fix the problem for me. > > > > > > ---8<--- > > > commit 7b5f466d5977dc9f41991ca0c2227023ac07709d > > > Author: Wei Liu <wei.liu2@xxxxxxxxxx> > > > Date: Tue Aug 11 18:02:25 2015 +0100 > > > > > > xl: close restore_fd when we finish with it > > > > > > Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx> > > > > > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > > > index 499a05c..525cd24 100644 > > > --- a/tools/libxl/xl_cmdimpl.c > > > +++ b/tools/libxl/xl_cmdimpl.c > > > @@ -2846,6 +2846,10 @@ start: > > > ret = libxl_domain_create_new(ctx, &d_config, &domid, > > > 0, autoconnect_console_how); > > > } > > > + > > > + if (migrate_fd < 0) > > > + close(restore_fd); > > > > As Andy says I think we want restore_fd in the check, I can't see any > > reason we wouldn't want to close the socket too. > > > > Do you mean migrate_fd when you say "socket"? In the migrate case we do "restore_fd = migrate_fd;", so yes, indirectly. > I tried that, but that led > to failure because toolstack still needs to get controlling information > out of it (the "GO" message). > > Maybe I close this too early. Right. > I will have a closer look today. > > > For reboot handing you would need to reset the fd to < 0, otherwise > > when we > > come back around on reboot we will close this again. > > > > Would it be less error prone to put this in the if (restoring) just > > above, > > i.e. exactly where restore_fd is used and which already has the reboot > > logic in place with restoring = 0. > > > > Depending on whether we can close migrate_fd. > > Wei. > > > Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |