msg32115 - (view) |
Author: Geoffrey Bache (gjb1002) |
Date: 2007-05-23 16:42 |
On UNIX, I cannot read pickle files created on Windows using the cPickle module, even if I open the file with universal line endings.
It works fine with the pickle module but is of course slower (and I have to read lots of them)
I attach a test case that pickles and unpickles an smptlib.SMTP object, converting the file to DOS format in between. There is nothing special about SMTP, you can use any object at all in a different module.
On my system (RHEL4 with Python 2.4.3) I get the following output:
portmoller : pickletest.py cPickle
unix2dos: converting file dump to DOS format ...
Traceback (most recent call last):
File "pickletest.py", line 14, in ?
print load(readFile)
ImportError: No module named smtplib
portmoller : pickletest.py pickle
unix2dos: converting file dump to DOS format ...
<smtplib.SMTP instance at 0xb7ea350c>
|
msg32116 - (view) |
Author: Gabriel Genellina (ggenellina) |
Date: 2007-05-25 09:00 |
Please try again with this modified version. I think you will see that Python is trying to import "smtplib\r"
On Windows, trying to read a pickle file with MAC line endings gives a different error:
cPickle.UnpicklingError: pickle data was truncated
It seems that cPickle support for protocol 0 is broken. If you can, try to use the higher, binary, protocols, they don't have this problem. Even if you must use protocol 0, opening the file always in binary mode should not have this problem.
|
msg32117 - (view) |
Author: Gabriel Genellina (ggenellina) |
Date: 2007-05-25 09:04 |
I don't see any "Attach" button...
Just add these lines near the top of the test script:
original__import = __import__
def myimport(name, *args):
print "import",repr(name)
return original__import(name,*args)
#end myimport
__builtins__.__import__ = myimport
|
msg32118 - (view) |
Author: Gabriel Genellina (ggenellina) |
Date: 2007-05-25 10:29 |
The culprit is cPickle.c; it takes certain shortcuts for read() and readline() depending on which type of file you pass in.
For a true file object, it uses its own implementation for those two methods, ignoring the file mode.
But it appears that there is NO WAY universal line endings could work if the pickle contains any unicode object. The pickle format for Unicode quotes any \n but *not* \r so the unpickler cannot determine, when it sees a "\r", if it is a MAC end-of-line or an embedded "\r".
So, the only safe end-of-line character for a pickle using protocol 0 is "\n", and that means that the file must be written in binary mode.
(This may also indicate that you cannot read unicode objects with embedded \r in a MAC using protocol 0, but I don't have a MAC to test it).
So, until this is fixed (either the module or the documentation), one should forget about universal line endings and write all pickle files as binary. (This way ALL lines end in \n and it should work fine on all platforms)
|
msg32119 - (view) |
Author: Geoffrey Bache (gjb1002) |
Date: 2007-05-25 17:24 |
Yes, I'm sure Python is trying to import "smtplib\r".
For various reasons I need to use protocol 0: not least because I use the pickle files as test data and it's much easier to administer a load of text files than a load of binary files.
I will experiment with reading the files in binary mode on Monday and get back to you. My current workaround is to do loads(file.read()) instead of load(file) which I guess is a performance penalty. Any idea whether this is likely to be slower than just using the pickle module? (I haven't tested this)
|
msg32120 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2007-05-29 05:14 |
Jack, can you take a look? If not, please unassign.
|
msg32121 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2007-07-12 16:24 |
I can confirm that this is problem is present in python 2.5 (current svn) running on osx 10.4.10. Given the code of cPickle it is rather amazing that this script does work correctly on a linux system, as gagenellina noted cPickle shortcuts reads from real file objects and completely ignores universal newlines while doing so.
IMHO Fixing this requires replicating the universal newline code in cPickle.
|
msg87619 - (view) |
Author: Daniel Diniz (ajaksu2) * |
Date: 2009-05-12 13:06 |
Confirmed in trunk and py3k.
|
msg87621 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-05-12 13:17 |
Why would use a file in universal line endings mode for saving/loading
pickles? Pickles are binary data (even if version 0 pickles happens to
be human-readable), so you should open the files in binary mode (either
"rb" or "wb").
|
msg87622 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-05-12 13:20 |
Also, I don't understand how you confirmed this bug under py3k. Text
files under py3k forbid bytes input, which is what pickle produces:
>>> pickle.dump([], sys.stdout)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/antoine/py3k/__svn__/Lib/pickle.py", line 1333, in dump
Pickler(file, protocol).dump(obj)
TypeError: write() argument 1 must be str, not bytes
|
msg110352 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-07-15 06:55 |
This does not look like a valid bug to me. OP does not show that pickle files are different on different systems, he mangles pickle file with unix2dos instead. This would certainly produce an invalid pickle because pickle format requires '\n' and no other character as an opcode terminator.
If incompatible pickle files were produced on windows, most likely that was because the data was written in files opened in text rather than binary mode.
|
msg110355 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2010-07-15 07:50 |
Antoine, to answer your question about universal newlines in pickle in msg87622. The pickle.py docsstrings in 2.7+ contain the following text (amongst others):
The optional protocol argument tells the pickler to use the
given protocol; supported protocols are 0, 1, 2. The default
protocol is 0, to be backwards compatible. (Protocol 0 is the
only protocol that can be written to a file opened in text
mode and read back successfully. When using a protocol higher
than 0, make sure the file is opened in binary mode, both when
pickling and unpickling.)
This clearly indicates that protocol 0 is supposed to compatible with text-mode files. That would mean this issue probably is not invalid, the documentation above implies that a pickle file written in text mode on Windows should be readable on a Unix system.
That said, I'd advise anyone to use the highest possible protocol because higher protocol levels are more efficient and better support newstyle classes.
|
msg110373 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-07-15 15:41 |
> The pickle.py docsstrings in 2.7+ contain the following text
> (amongst others):
>
> .. Protocol 0 is the
> only protocol that can be written to a file opened in text
> mode and read back successfully.
Hmm, indeed. The ReST documentation also has the following note:
"""
Note: Be sure to always open pickle files created with protocols >= 1 in binary mode. For the old ASCII-based pickle protocol 0 you can use either text mode or binary mode as long as you stay consistent.
"""
but as Gabriel mentioned above, this should be qualified by at least adding unless pickle contains unicode strings with embedded '\r' on platforms that use '\r' as a part of its end of line sequence.
I don't think changing the way unicode is pickled is an option. Fixing this aspect of cPickle to behave more like pickle.py given the number of other differences does not look like a good use of developer's time.
I think this is the case were existing behavior should just be better documented. See also issue616013.
|
msg220397 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-06-12 23:11 |
Can this be closed as issue616013 was?
|
msg237004 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2015-03-02 01:47 |
msg185431 from #616013 states "Three years later, I don't think anyone is interested in documenting the outdated cPickle." so I believe this should suffer the same fate.
|
msg370454 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2020-05-31 13:39 |
Python 2.7 is no longer supported.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:24 | admin | set | github: 44989 |
2020-05-31 13:39:09 | serhiy.storchaka | set | status: open -> closed
nosy:
+ serhiy.storchaka messages:
+ msg370454
resolution: out of date stage: test needed -> resolved |
2019-03-15 22:35:34 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2015-03-02 01:47:14 | BreamoreBoy | set | messages:
+ msg237004 |
2014-06-12 23:11:29 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg220397
|
2010-07-15 15:41:30 | belopolsky | set | assignee: belopolsky -> dependencies:
+ cPickle documentation incomplete components:
+ Documentation versions:
+ Python 2.7, - Python 2.6 nosy:
loewis, jackjansen, ronaldoussoren, gjb1002, belopolsky, ggenellina, pitrou, ajaksu2, benjamin.peterson messages:
+ msg110373 resolution: not a bug -> (no value) |
2010-07-15 07:50:22 | ronaldoussoren | set | status: pending -> open
messages:
+ msg110355 |
2010-07-15 06:55:36 | belopolsky | set | status: open -> pending
nosy:
+ belopolsky messages:
+ msg110352
assignee: belopolsky resolution: not a bug |
2009-05-12 13:20:14 | pitrou | set | messages:
+ msg87622 |
2009-05-12 13:17:19 | pitrou | set | messages:
+ msg87621 versions:
- Python 3.1 |
2009-05-12 13:06:46 | ajaksu2 | set | files:
+ pickletest_py3k.py
type: behavior components:
+ IO versions:
+ Python 2.6, Python 3.1, - Python 2.4 nosy:
+ benjamin.peterson, pitrou, ajaksu2
messages:
+ msg87619 stage: test needed |
2008-05-03 09:28:06 | ronaldoussoren | set | assignee: jackjansen -> (no value) |
2007-05-23 16:42:24 | gjb1002 | create | |