[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?



On 11/08/15 18:07, Wei Liu wrote:
> On Tue, Aug 11, 2015 at 04:48:13PM +0100, Ian Campbell wrote:
>> On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote:
>>> It's the checkpoint file - i.e. the command line argument to xl
>>> restore - that is being leaked.
>> Thanks.
>>
>> [...]
>>> So the checkpoint file is clearly being leaked.
>> Indeed. I confirmed this even with the current development version using ls
>> -l /proc/<pid>/fd which shows an fd open on a deleted file:
>>
>> # ps aux| grep xl
>> root     20465  0.0  0.2 106036   984 ?        SLsl 15:42   0:00 xl restore 
>> save
>> # ls -l /proc/20465/fd
>> [...]
>> lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save
>> [...]
>> # rm /root/save
>> # ls -l /proc/20465/fd
>> [...]
>> lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted)
>> [...]
>>
>>>  Its space is not freed
>>> until the 'xl restore' process is ended by shutting down the domain:
>> [...]
>>> It seems like xl restore should close the checkpoint file as soon as
>>> it's done restoring the domain, allowing the space to be freed, but
>>> that's clearly not happening.
>> Right. In fact xl sets the file to be close-on-exec right after opening it,
>> which is before the daemonisation step, so it ought to be closed
>> automatically, but isn't for some reason.
>>
>> My working theory is that something in the machinery which spawns the save
>> helper is defeating the use of CLOEXEC, perhaps by dup2() or perhaps by
>> unsetting CLOEXEC.
>>
>> Any way, thanks for reporting. I've copied the devel list and 4.6 RM. Wei
>> this probably ought to be a blocker for 4.6 (and the fix ought ultimately
>> to be backported to 4.4 onwards at least).
>>
>> NB: This leak seems to be independent of the switch to migration v2.
>>
>> Ian.
> Maybe this is just because we leak a fd.
>
> I don't see how CLOEXEC would be of any use if xl doesn't actually exec
> anything.
>
> Below is a PoC patch which seems to fix the problem for me.
>
> ---8<---
> commit 7b5f466d5977dc9f41991ca0c2227023ac07709d
> Author: Wei Liu <wei.liu2@xxxxxxxxxx>
> Date:   Tue Aug 11 18:02:25 2015 +0100
>
>     xl: close restore_fd when we finish with it
>     
>     Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
>
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 499a05c..525cd24 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2846,6 +2846,10 @@ start:
>          ret = libxl_domain_create_new(ctx, &d_config, &domid,
>                                        0, autoconnect_console_how);
>      }
> +
> +    if (migrate_fd < 0)
> +        close(restore_fd);
> +

You surely need check for restore_fd >= 0, to avoid a potential EBADF ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.