This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pickle interns strings
Type: Stage:
Components: Documentation Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: fdrake Nosy List: fdrake, loewis, nascheme, phr, tim.peters, wc2so1
Priority: normal Keywords:

Created on 2002-01-11 21:21 by wc2so1, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
string_unquote2.diff nascheme, 2002-03-22 20:27
unquote_doc.diff nascheme, 2002-03-23 21:21
Messages (9)
msg8712 - (view) Author: Brian Kelley (wc2so1) Date: 2002-01-11 21:21
Pickle (and cPickle) use eval to reconstruct string
variables from the stored format.  Eval is used because
it correctly reconstructs the repr of a string back
into the original string object by translating all the
appropriately escape characters like "\\m" and "\n"

There is an side effect in that eval interns string
variables for faster lookup.

This causes the following sample code to unexpectedly
grow in memory consumption:

import pickle
import random
import string

def genstring(length=100):
    s = [random.choice(string.letters) for x in
range(length)]
    return "".join(s)

def test():
    while 1:
        s = genstring()
        dump = pickle.dumps(s)
        s2 = pickle.loads(dump)
        assert s == s2

test()

Note that all strings are not interned, just ones that,
as Tim Peters once said, "look like", variable names. 
The above example is contrived to generate a lot of
different names that "look like" variables names but
since this has happened in practice it probably should
documented.

Interestingly, by inserting
 s.append(" ")
before
 return "".join(s)

The memory consumption is not seen because the names no
longer "look like" variable names.
msg8713 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-01-11 21:36
Logged In: YES 
user_id=31435

Noting that Security Geeks are uncomfortable with using eval
() for this purpose regardless.  Would be good if Python 
got refactored so that pickle and cPickle and the front end 
all called a new routine that simply parsed the escape 
sequences in a character buffer, returning a Python string 
object.

Don't ask me about Unicode <wink>.
msg8714 - (view) Author: paul rubin (phr) Date: 2002-02-16 01:26
Logged In: YES 
user_id=72053

I agree about eval being dangerous.  Also, the memory leak
is itself a security concern: if an attacker can stuff
enough strings into the unpickler to exhaust memory, that's
a denial of service attack.
msg8715 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-22 18:35
Logged In: YES 
user_id=35752

Attached is a patch that implements _PyString_Unquote
and strop.unquote.  strop is probably the wrong place since
it's deprecated but I'm not sure if unquote as a string method
is right either.  cPickle.c is changed to use
_PyString_Unquote instead of calling eval.  pickle.py
still needs to be fixed, documentation added, tests fixed.
Before all that, does this patch look sane?
msg8716 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-03-22 19:12
Logged In: YES 
user_id=31435

I haven't tried the patch, but yes, it looks sane to me.  A 
deprecated module is definitely a poor place to add a new 
feature <wink>; it makes as much sense as a string method 
as, say, .upper(), right?  That is, why not?  String in, 
string out.
msg8717 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-22 20:27
Logged In: YES 
user_id=35752

Okay, I moved unquote to a method of str.  I also fixed
pickle.py and the tests (no need to test for insecure
strings).

Fred, do you have to time to write documentation for 
_PyString_Unquote and str.unquote?
msg8718 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-23 21:21
Logged In: YES 
user_id=35752

Attached is a first stab at documentation.
msg8719 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-03-24 02:03
Logged In: YES 
user_id=35752

See patch 505705 for a slightly different solution.
msg8720 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-08-14 08:00
Logged In: YES 
user_id=21627

This is fixed with patch 505705.
History
Date User Action Args
2022-04-10 16:04:52adminsetgithub: 35908
2002-01-11 21:21:50wc2so1create