This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Py_FileSystemDefaultEncoding can be non-canonical
Type: behavior Stage:
Components: Interpreter Core, Unicode Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, sdeibel, vstinner
Priority: normal Keywords: patch

Created on 2006-12-04 22:06 by sdeibel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
patch-bug1.diff sdeibel, 2006-12-04 22:06 Possible fix (based on 2.5 sources)
Messages (4)
msg51476 - (view) Author: Stephan R.A. Deibel (sdeibel) Date: 2006-12-04 22:06
On Linux/Unix it is possible for Py_FileSystemDefaultEncoding to be set to a non-canonical  encoding such as "UTF-8" instead of "utf-8".  This happens when it is set from codeset in Py_InitializeEx() in pythonrun.c.

This becomes a problem when this value is propagated through to PyUnicode_Decode() or PyUnicode_AsEncodedString() in unicodeobject.c.  One possible such code path starts in os.listdir() via PyUnicode_FromEncodedObject()).

In that case, the common case optimizations fail.  I noticed this in a case where the PyCodec_Decode() used instead was failing.  Normally I think this just amounts to broken optimization but given the likelihood of other such code being added in the future, I feel it's best to fix Py_FileSystemDefaultEncoding to always be a canonical form.

One possible way to fix it is attached as a patch.
msg84592 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-03-30 17:39
Does this affect 3.x?
msg84817 - (view) Author: Stephan R.A. Deibel (sdeibel) Date: 2009-03-31 16:06
It appears to be specific to 2.x and does not occur under Python 3.0:

Python 3.0 (r30:67503, Jan 15 2009, 09:27:16)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'

Python 2.6.1 (r261:67515, Dec 11 2008, 11:59:39)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>import sys
>>> sys.getfilesystemencoding()
'UTF-8'

Python 2.5.4 (r254:67916, Mar 16 2009, 09:34:35)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'

(This is on a Ubuntu system where LANG=en_US.UTF-8 is the default)
msg106104 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-19 21:09
This issue is a duplicate of #4213 which was fixed by r67055 (py3k).
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44303
2010-05-19 21:09:24vstinnersetstatus: open -> closed
resolution: duplicate
messages: + msg106104
2009-03-31 16:06:58sdeibelsetmessages: + msg84817
2009-03-30 17:39:20ajaksu2setnosy: + ajaksu2, vstinner
messages: + msg84592

components: + Unicode
type: behavior
2008-01-12 05:05:18christian.heimessetversions: + Python 2.6, - Python 2.5
2006-12-04 22:06:35sdeibelcreate