This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode encoders appears to leak references
Type: Stage:
Components: Unicode Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lemburg Nosy List: lemburg, mhammond, nnorwitz, nobody
Priority: normal Keywords:

Created on 2002-04-28 09:17 by mhammond, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
encodeleak.py mhammond, 2002-04-28 09:17 Program demonstrating leak.
codecs.c.patch mhammond, 2002-04-28 10:39 Better patch
Messages (12)
msg10603 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-04-28 09:17
Note the following Debug Python session:

>>> s=u"anything"
[8189 refs]
>>> v=s.encode("utf8")
[10967 refs]
>>> v=s.encode("utf8")
[10968 refs]
>>> v=s.encode("utf8")
[10969 refs]
>>> v=s.encode("utf8")
[10970 refs]

Each call to encode is losing a reference.  Attaching a
test program that demonstrates this in more detail. 
The output from my test program is:

After 10000 iterations, lost 12850 references
[15227 refs]

and for 100000:
After 100000 iterations, lost 102850 references
[105227 refs]

etc.

As far as I can tell, this appears in all Python 2.x
versions.
msg10604 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-04-28 09:26
Logged In: YES 
user_id=14198

s/decode/encode/ :)  Also meant to mention problem not
restricted to UTF8 - changing the encoding in the text file
to anything other than 'ascii' seems to leak in the same way.
msg10605 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-04-28 10:05
Logged In: YES 
user_id=14198

Found it :)  Attaching patch.
msg10606 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-04-28 10:39
Logged In: YES 
user_id=14198

Oops - too quick.  All calls to _PyCodec_Lookup() leak.
msg10607 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-06-04 01:35
Logged In: YES 
user_id=33168

Patch makes sense to me.
If you add a test, I may be able to catch the problem w/purify
next time I run it (if purify works).
msg10608 - (view) Author: Nobody/Anonymous (nobody) Date: 2002-06-04 04:39
Logged In: NO 

I'm not sure what sort of test you are suggesting I add.  I
think the patch is pretty obvious and reasonable, so MAL
should just check it in or assign it back to me <wink>. 
Earlier the better really.
msg10609 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-06-04 04:42
Logged In: YES 
user_id=14198

damn sourceforge - it went to the trouble of asking my email
address when I submitted without being logged in, but it
doesn't seem to have done anything with it - so that was me
just incase you weren't sure :)
msg10610 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-06-04 07:20
Logged In: YES 
user_id=38388

I'll have a look later today.
msg10611 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-06-04 17:25
Logged In: YES 
user_id=33168

Basically the code in the report would be fine.
Purify *should* catch anything which causes the leak.
So:
    s = u'anything'
    assert(s.encode('utf-8') == s.encode('utf-8'))

should work.  Perhaps, there is already a test for this?
and purify didn't report leaks.
msg10612 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-07-17 23:07
Logged In: YES 
user_id=14198

A tickle for Marc, assuming his days aren't quite *that*
long <wink>.  Just give the OK and I will check it in.
msg10613 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-07-18 13:34
Logged In: YES 
user_id=38388

Perfect. I've marked it as Python 2.2.1 candidate. Please
also mention this in the checkin message.

Thanks. (And sorry for not getting back earlier -- my days
are indeed *very* long ;-)
msg10614 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2002-07-18 23:07
Logged In: YES 
user_id=14198

Checking in codecs.c;
/cvsroot/python/python/dist/src/Python/codecs.c,v  <--  codecs.c
new revision: 2.14; previous revision: 2.13
done
History
Date User Action Args
2022-04-10 16:05:16adminsetgithub: 36513
2002-04-28 09:17:59mhammondcreate