[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen-blkback: fix memory leaks



On 28/01/14 16:37, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 28, 2014 at 01:44:37PM +0100, Roger Pau Monné wrote:
>> On 27/01/14 22:21, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
>>>> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req 
>>>> *pending_req, int error)
>>>>     * the proper response on the ring.
>>>>     */
>>>>    if (atomic_dec_and_test(&pending_req->pendcnt)) {
>>>> -          xen_blkbk_unmap(pending_req->blkif,
>>>> +          struct xen_blkif *blkif = pending_req->blkif;
>>>> +
>>>> +          xen_blkbk_unmap(blkif,
>>>>                            pending_req->segments,
>>>>                            pending_req->nr_pages);
>>>> -          make_response(pending_req->blkif, pending_req->id,
>>>> +          make_response(blkif, pending_req->id,
>>>>                          pending_req->operation, pending_req->status);
>>>> -          xen_blkif_put(pending_req->blkif);
>>>> -          if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
>>>> -                  if (atomic_read(&pending_req->blkif->drain))
>>>> -                          complete(&pending_req->blkif->drain_complete);
>>>> +          free_req(blkif, pending_req);
>>>> +          xen_blkif_put(blkif);
>>>> +          if (atomic_read(&blkif->refcnt) <= 2) {
>>>> +                  if (atomic_read(&blkif->drain))
>>>> +                          complete(&blkif->drain_complete);
>>>>            }
>>>> -          free_req(pending_req->blkif, pending_req);
>>>
>>> I keep coming back to this and I am not sure what to think - especially
>>> in the context of WRITE_BARRIER and disconnecting the vbd.
>>>
>>> You moved the 'free_req' to be done before you do atomic_read/dec.
>>>
>>> Which means that we do:
>>>
>>>     list_add(&req->free_list, &blkif->pending_free);
>>>     wake_up(&blkif->pending_free_wq);
>>>
>>>     atomic_dec
>>>     if atomic_read <= 2 poke thread that is waiting for drain.
>>>
>>>
>>> while in the past we did:
>>>
>>>     atomic_dec
>>>     if atomic_read <= 2 poke thread that is waiting for drain.
>>>
>>>     list_add(&req->free_list, &blkif->pending_free);
>>>     wake_up(&blkif->pending_free_wq);
>>>
>>> which means that we are giving the 'req' _before_ we decrement
>>> the refcnts.
>>>
>>> Could that mean that __do_block_io_op takes it for a spin - oh
>>> wait it won't as it is sitting on a WRITE_BARRIER and waiting:
>>>
>>> 1226         if (drain)                                                     
>>>          
>>> 1227                 xen_blk_drain_io(pending_req->blkif);  
>>>
>>> But still that feels 'wrong'?
>>
>> Mmmm, the wake_up call in free_req in the context of WRITE_BARRIER is
>> harmless since the thread is waiting on drain_complete as you say, but I
>> take your point that it's all confusing. Do you think it will feel
>> better if we gate the call to wake_up in free_req with this condition:
>>
>> if (was_empty && !atomic_read(&blkif->drain))
>>
>> Or is this just going to make it even messier?
> 
> My head spins around when thinking about the refcnt, drain, the two or
> three workqueues. 
> 
>>
>> Maybe just adding a comment in free_req saying that the wake_up call is
>> going to be ignored in the context of a WRITE_BARRIER, since the thread
>> is already waiting on drain_complete is enough.
> 
> Perhaps. You do pass in the 'force' bool flag and we could piggyback
> on that. Meaning you could do

In the new version I'm preparing I'm no longer calling drain_io on
xen_blkif_schedule (so there's no "force" flag), instead I've moved the
cleanup code to xen_blkif_free where I think it makes more sense.

Also the force flag was just a local variable to drain_io, I think it
would get even messier if we add yet another variable (force) to the
xen_blkif struct.

> 
> /* a comment about what we just mentioned */
> 
> if (!force) {
>       // do it the old way
> } else {
> 
>       /* A comment mentioning _why_ we need the code reshuffled */
> 
>       // do it the new way
> }
> 
> It would be a bit messy - but:
>  - We won't have to worry about breaking WRITE_BARRIER as the old
>    logic would be preserved. So less worry about regressions.
>  - The bug-fix would be easy to backport as it would inject code for
>    just the usage you want - that is to drain all I/Os.
>  - It would make a nice distinction and allows us to refactor
>    this in future patches.
> The cons are that:
>  - It would add extra path for just the use-case of shutting down
>    without using the existing one.
>  - It would be messy
> 
> 
> But I think when it comes to fixes like these that are
> candidates for backports - messy is OK and if they don't have any
> posibility of introducing regressions on existing other behaviors -
> then we should stick with that.
> 
> 
> Then in the future we can refactor this to use less of these
> workqueues, refcnt and atomics we have. It is getting confusing.
> 
> Thoughts?

Let me post the whole series as I have them now, and we can pick it up
again from that.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.