This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Remove eval in pickle and cPickle
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: gvanrossum, loewis
Priority: normal Keywords: patch

Created on 2002-01-19 09:21 by loewis, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickle.diff loewis, 2002-08-11 20:47
pickle.diff loewis, 2002-08-12 23:40 version 3
Messages (10)
msg38752 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-01-19 09:21
This patch removes the use of eval in pickle and cPickle.

It does so by:
- moving the actual parsing from compile.c:parsestr to
PyString_DecodeEscape
- introducing a new codec string-escape
- removing the code that checks that a
string-to-unpickle is properly escaped throughout, and
replaces this with a check whether it is properly quoted,
- unquoting the string in load_string, then passing it
to the codec.

This fixes #502503.
msg38753 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-01-19 09:25
Logged In: YES 
user_id=21627

BTW, this patch has #500002 as a prerequisite.
msg38754 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-09 04:12
Logged In: YES 
user_id=6380

I'll review this.
msg38755 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-09 14:48
Logged In: YES 
user_id=6380

I like this idea. But the patch is out of date. Can you
rework the patch? How much faster does this make the test
program from

1026940226.16076.python-list%40python.org">http://groups.google.de/groups?hl=en&lr=&ie=UTF-8&selm=mailman.1026940226.16076.python-list%40python.org

???
msg38756 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-11 14:09
Logged In: YES 
user_id=6380

This would fix bug #593656 too.
msg38757 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-08-11 20:47
Logged In: YES 
user_id=21627

Updated to current CVS.
msg38758 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-12 01:51
Logged In: YES 
user_id=6380

Closer.

- Why bother stripping triple quotes in the pickle/cPickle
load code? These will never happen as a result of a pickle
dump AFAIK, and the code you are replacing doesn't accept
these either AFAICT.

- There's something missing (the previous version of the
patch had it I believe) that's needed to register the codec;
as a consequence, pickle.loads() doesn't work.

- escape_encode() uses repr() of a string to do the work.
But that means the outcome for embedding string quotes is
confusing, because of the "smarts" in repr() that use " for
surrounding quotes when there's a ' in the string, and vice
versa. Thus, a single quote or a double quote is returned
unquoted; but if they both occur in the same string, the
single quote is quoted. I don't think that's particularly
useful. Maybe there should be an underlying primitive
operation that gives you a choice and which is invoked both
by escape_encode() and string repr()?

- I don't understand the recode_encoding stuff, but it looks
like something like that was present before too. :-)
msg38759 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-08-12 23:40
Logged In: YES 
user_id=21627

New version:
- removed tripple quote support in pickle/cPickle

- added Lib/encodings/string_escape.py again

- added PyString_Repr, which takes a smartquotes argument

- recode_encoding is for PEP 263: the parser generates UTF-8
in the abstract syntax, which needs to be re-encoded with
the original encoding. Unfortunately, \-escaping and UTF-8
may interleave, hence the convoluted code.

On the Sam Penrose article: Without patch:
dumping list of 1000 dicts:
dumped: 0.192386031151
loading list of 1000 dicts:
loaded: 2.46496498585
dumping list of 10000 dicts:
dumped: 1.92456102371
loading list of 10000 dicts:
loaded: 24.6884089708

with patch:
dumping list of 1000 dicts:
dumped: 0.201091051102
loading list of 1000 dicts:
loaded: 0.469774007797
dumping list of 10000 dicts:
dumped: 1.94221496582
loading list of 10000 dicts:
loaded: 4.8661159277

So loading speed is up by a factor of 5, for this benchmark.
msg38760 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-13 20:43
Logged In: YES 
user_id=6380

This looks good. I measured a similar speedup. Go for it!
msg38761 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-08-14 08:14
Logged In: YES 
user_id=21627

Applied as

stringobject.h 2.36;
pickle.py 1.69;
string_escape.py 1.1;
pickletester.py 1.18;
_codecsmodule.c 2.14;
cPickle.c 2.95;
stringobject.c 2.177;
compile.c 2.256;

It turns out that #593656 isn't fixed: nobody checks the
trailing \, so it will be unpickled as '\\\x00'
(consistently in pickle and cPickle, though).
History
Date User Action Args
2022-04-10 16:04:54adminsetgithub: 35952
2002-01-19 09:21:24loewiscreate