[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domains



Could you give the attached a try on your test case?  If
it passes and Jan thinks it is OK (as I backed out most of
his patch at cs 20918), then:

Tmem: fix domain refcount leak by returning to simpler model
which claims a ref once when the tmem client is first associated
with the domain, and puts it once when the tmem client is
destroyed.

Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

> If you have a handle on a domain already I wonder why you need to
> continually look up by domid...

Nearly all tmem uses are via current->domain.  The remaining
(such as from the tools) are via a specified domid.  I don't
keep a domid->domain lookup table around as the frequency is
very low and the existing mechanism works fine (or it does
if I use it correctly anyway ;-)

> RCU locking
> is fully implemented already. It's highly unlikely to change in future
> and we can work out something else for your case if that happens.

I guess I was confused by the fact that the rcu_lock/unlock macros
are no-ops.

In any case, I think I understand the semantics well enough now
after your second reply pointing me to rcu_unlock_domain, so
I think the attached patch should avoid special cases in the
future.

> > I'd like to do a get_domain_by_id() without doing a get_domain()
> > as the tmem code need only get_domain() once on first use
> > and put_domain() once when destroying, but frequently needs
> > to lookup a domain by id.
> 
> If you have a handle on a domain already I wonder why you need to
> continually look up by domid...
> 
> > It looks like rcu_lock_domain_by_id() does what I need, but
> > I don't need any rcu critical sections (outside of the domain
> > lookup itself) and am fearful that if rcu locking ever is fully
> > implemented, my use of rcu_lock_domain_by_id() would become
> > incorrect and I may have a problem.  Should I create an equivalent
> > get_domain_by_id_no_ref()?  Or am I misunderstanding something?
> 
> If you really know what you're doing, I suggest just have your own
> tmh_lookup_domain() macro mapping onto rcu_lock_domain_by_id(). RCU
> locking
> is fully implemented already. It's highly unlikely to change in future
> and
> we can work out something else for your case if that happens.
> 
> I'm not keen on providing an explicitly synchronisation-free version in
> common code. It just encourages people not to think about
> synchronisation at
> all.
> 
>  -- Keir
> 
> > Semi-related, rcu_lock_domain_by_id() has a "return d" inside
> > the for loop without an rcu_read_unlock().  I see that this
> > is irrelevant now because rcu_read_unlock() is a no-op anyway,
> > but maybe this should be cleaned up for the same reason --
> > in case rcu locking is ever fully implemented.
> >
> > Thanks,
> > Dan
> >
> >> -----Original Message-----
> >> From: Dan Magenheimer
> >> Sent: Thursday, June 10, 2010 7:08 AM
> >> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx
> >> Subject: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie
> saved
> >> domains
> >>
> >> OK, will take a look.
> >>
> >> IIRC, Jan's work to fix the domain reference stuff just
> >> before 4.0 shipped was a heavy hammer but since it seemed
> >> to work, I didn't want to mess with it so close to release...
> >> really there's only a need to take a reference once on
> >> first use and release it at shutdown, rather than
> >> take/release frequently.  IIRC, I had used a macro that
> >> took references when they weren't really needed and
> >> Jan placed the matching macros that did the release.
> >>
> >> Dan
> >>
> >>> -----Original Message-----
> >>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> >>> Sent: Thursday, June 10, 2010 3:47 AM
> >>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> Cc: Dan Magenheimer
> >>> Subject: Bug in tmem: refcount leak leaves zombie saved domains
> >>>
> >>> Dan,
> >>>
> >>> Just doing some save/restore testing on xen-unstable tip, I noticed
> >>> that:
> >>> # xm create ./pv_config
> >>> # xm save PV1
> >>>
> >>> Would leave the saved guest as a zombie in the DOMDYING_dead state
> >> with
> >>> no
> >>> pages, yet with refcnt=1. This happens absolutely consistently.
> Just
> >> as
> >>> consistently, it does not happen when I boot Xen with no-tmem. My
> >>> conclusion
> >>> is that tmem is leaking a domain reference count during domain
> save.
> >>> This
> >>> doesn't happen if I merely "xm create ...; xm destroy ...".
> >>>
> >>> My pv_config file contains nothing exciting:
> >>> kernel = "/nfs/keir/xen/xen64.hg/dist/install/boot/vmlinuz-
> 2.6.18.8-
> >>> xenU"
> >>> memory = 750
> >>> name = "PV1"
> >>> vcpus = 2
> >>> vif = [ 'mac=00:1a:00:00:01:01' ]
> >>> disk = [ 'phy:/dev/VG/Suse10.1_64_1,sda1,w' ]
> >>> root = "/dev/sda1 ro xencons=tty"
> >>> extra = ""
> >>> tsc_native = 1
> >>> on_poweroff = 'destroy'
> >>> on_reboot   = 'restart'
> >>> on_crash    = 'preserve'
> >>>
> >>> The dom{0,U} kernels are tip of linux-2.6.18-xen, default -xen{0,U}
> >>> configs.
> >>>
> >>>  -- Keir
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> 
> 

Attachment: tmem-fix-refcount-leak.patch
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.