[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domains



On 10/06/2010 21:06, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:

> This patch looks good, I'll test it tomorrow.

It tests okay so I applied it as xen-unstable:21595.

> What you *do* need to do when setting up a new tmem client is check that the
> associated domain is not dying. Without that check the code is in fact
> currently buggy: you can end up with a zombie domain that is a client of
> tmem and will never stop being a client because it became a client after
> tmem_destroy() was called on it.

I implemented this as xen-unstable:21596. Take a look. It's pretty
straightforward.

I think both of these should be backported for Xen 4.0.1?

 -- Keir

> Does that make sense?
> 
>  -- Keir
> 
> On 10/06/2010 18:54, "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx> wrote:
> 
>> Could you give the attached a try on your test case?  If
>> it passes and Jan thinks it is OK (as I backed out most of
>> his patch at cs 20918), then:
>> 
>> Tmem: fix domain refcount leak by returning to simpler model
>> which claims a ref once when the tmem client is first associated
>> with the domain, and puts it once when the tmem client is
>> destroyed.
>> 
>> Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
>> 
>>> If you have a handle on a domain already I wonder why you need to
>>> continually look up by domid...
>> 
>> Nearly all tmem uses are via current->domain.  The remaining
>> (such as from the tools) are via a specified domid.  I don't
>> keep a domid->domain lookup table around as the frequency is
>> very low and the existing mechanism works fine (or it does
>> if I use it correctly anyway ;-)
>> 
>>> RCU locking
>>> is fully implemented already. It's highly unlikely to change in future
>>> and we can work out something else for your case if that happens.
>> 
>> I guess I was confused by the fact that the rcu_lock/unlock macros
>> are no-ops.
>> 
>> In any case, I think I understand the semantics well enough now
>> after your second reply pointing me to rcu_unlock_domain, so
>> I think the attached patch should avoid special cases in the
>> future.
>> 
>>>> I'd like to do a get_domain_by_id() without doing a get_domain()
>>>> as the tmem code need only get_domain() once on first use
>>>> and put_domain() once when destroying, but frequently needs
>>>> to lookup a domain by id.
>>> 
>>> If you have a handle on a domain already I wonder why you need to
>>> continually look up by domid...
>>> 
>>>> It looks like rcu_lock_domain_by_id() does what I need, but
>>>> I don't need any rcu critical sections (outside of the domain
>>>> lookup itself) and am fearful that if rcu locking ever is fully
>>>> implemented, my use of rcu_lock_domain_by_id() would become
>>>> incorrect and I may have a problem.  Should I create an equivalent
>>>> get_domain_by_id_no_ref()?  Or am I misunderstanding something?
>>> 
>>> If you really know what you're doing, I suggest just have your own
>>> tmh_lookup_domain() macro mapping onto rcu_lock_domain_by_id(). RCU
>>> locking
>>> is fully implemented already. It's highly unlikely to change in future
>>> and
>>> we can work out something else for your case if that happens.
>>> 
>>> I'm not keen on providing an explicitly synchronisation-free version in
>>> common code. It just encourages people not to think about
>>> synchronisation at
>>> all.
>>> 
>>>  -- Keir
>>> 
>>>> Semi-related, rcu_lock_domain_by_id() has a "return d" inside
>>>> the for loop without an rcu_read_unlock().  I see that this
>>>> is irrelevant now because rcu_read_unlock() is a no-op anyway,
>>>> but maybe this should be cleaned up for the same reason --
>>>> in case rcu locking is ever fully implemented.
>>>> 
>>>> Thanks,
>>>> Dan
>>>> 
>>>>> -----Original Message-----
>>>>> From: Dan Magenheimer
>>>>> Sent: Thursday, June 10, 2010 7:08 AM
>>>>> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> Subject: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie
>>> saved
>>>>> domains
>>>>> 
>>>>> OK, will take a look.
>>>>> 
>>>>> IIRC, Jan's work to fix the domain reference stuff just
>>>>> before 4.0 shipped was a heavy hammer but since it seemed
>>>>> to work, I didn't want to mess with it so close to release...
>>>>> really there's only a need to take a reference once on
>>>>> first use and release it at shutdown, rather than
>>>>> take/release frequently.  IIRC, I had used a macro that
>>>>> took references when they weren't really needed and
>>>>> Jan placed the matching macros that did the release.
>>>>> 
>>>>> Dan
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
>>>>>> Sent: Thursday, June 10, 2010 3:47 AM
>>>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> Cc: Dan Magenheimer
>>>>>> Subject: Bug in tmem: refcount leak leaves zombie saved domains
>>>>>> 
>>>>>> Dan,
>>>>>> 
>>>>>> Just doing some save/restore testing on xen-unstable tip, I noticed
>>>>>> that:
>>>>>> # xm create ./pv_config
>>>>>> # xm save PV1
>>>>>> 
>>>>>> Would leave the saved guest as a zombie in the DOMDYING_dead state
>>>>> with
>>>>>> no
>>>>>> pages, yet with refcnt=1. This happens absolutely consistently.
>>> Just
>>>>> as
>>>>>> consistently, it does not happen when I boot Xen with no-tmem. My
>>>>>> conclusion
>>>>>> is that tmem is leaking a domain reference count during domain
>>> save.
>>>>>> This
>>>>>> doesn't happen if I merely "xm create ...; xm destroy ...".
>>>>>> 
>>>>>> My pv_config file contains nothing exciting:
>>>>>> kernel = "/nfs/keir/xen/xen64.hg/dist/install/boot/vmlinuz-
>>> 2.6.18.8-
>>>>>> xenU"
>>>>>> memory = 750
>>>>>> name = "PV1"
>>>>>> vcpus = 2
>>>>>> vif = [ 'mac=00:1a:00:00:01:01' ]
>>>>>> disk = [ 'phy:/dev/VG/Suse10.1_64_1,sda1,w' ]
>>>>>> root = "/dev/sda1 ro xencons=tty"
>>>>>> extra = ""
>>>>>> tsc_native = 1
>>>>>> on_poweroff = 'destroy'
>>>>>> on_reboot   = 'restart'
>>>>>> on_crash    = 'preserve'
>>>>>> 
>>>>>> The dom{0,U} kernels are tip of linux-2.6.18-xen, default -xen{0,U}
>>>>>> configs.
>>>>>> 
>>>>>>  -- Keir
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.