This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: compile() converts "filename" parameter to StringType
Type: behavior Stage: needs patch
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, georg.brandl, georg.brandl, loewis, wigy
Priority: normal Keywords:

Created on 2005-09-28 04:49 by wigy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
ExecIssue.py wigy, 2005-09-28 04:49 A unit test showing the issue
Messages (7)
msg26418 - (view) Author: Vágvölgyi Attila (wigy) Date: 2005-09-28 04:49
The builtin compile() signature looks like:

  compile(string, filename, kind[, flags[, dont_inherit]])

The string parameter can be either StringType or
UnicodeType, but the filename parameter will be
converted to StringType, so if there are non-ascii
characters in the unicode object passed, it raises
UnicodeEncodeError.

This can be an issue on filesystems having utf-8
filenames, or when using non-English names for the
backtrace beautification.

The attached file contains a unit test that will
succeed when the bug is resolved. I saw the error in
2.3 and 2.4, maybe it is there for all releases?
msg26419 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-09-28 12:54
Logged In: YES 
user_id=1188172

Should compile() use the Py_FileSystemEncoding?
msg26420 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-09-29 06:20
Logged In: YES 
user_id=21627

Why couldn't co_filename just be the Unicode string? I think
one would have to change:
- code_repr, to convert the filename into a byte string
(preferably using 'ascii', 'replace')
- tb_printinternal (not sure what to do here)
- code_new, to accept either strings or unicode strings
- builtin_compile, which probably indeed needs to convert
the string using the file system encoding, and then patch
the resulting code object to point to the unicode object
originally passed (unless we can accept more pythonrun
functions).
msg26421 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-09-29 06:34
Logged In: YES 
user_id=849994

Sounds sound. :)
msg26422 - (view) Author: Vágvölgyi Attila (wigy) Date: 2005-09-29 08:29
Logged In: YES 
user_id=156682

loewis, I confess I could not understand a word.

But as I see, it would have some advantages to have a
completely unicode internal filename representation on
systems having multiple filesystems mounted with different
encodings, or systems having simply utf-8 filesystems (no
'ascii', 'replace' for allowing two filenames differing only
in accents).

I agree with Joel Spolsky
(http://www.joelonsoftware.com/articles/Unicode.html), and I
think that if choosing unicode could be easier in a
language, than most of l10n problems would be solved. I
understand, that coding unicode in C is a pain.

Imagine - theoretically - if a literal like "hello" would
automatically mean a unicode object in python, and you had
to write s"hello" to make a literal string object encoded in
a way some enviromental settings (or maybe the PEP 0263
header of the specific source file?) determine, so you have
control on what happens.

Imagine the case when there is a latin1 and a utf-8
partition mounted, and the console is latin2! Life would be
much, much easier for a non-American programmer if she had
to be aware from the first moment, that she is in an
international environment.
msg83879 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-03-20 22:43
Confirmed on trunk.
msg85499 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-04-05 14:27
I don't think this will be dealt with in the 2.x series.  Python 3
already has support for Unicode file names, so it's out of date there.
History
Date User Action Args
2022-04-11 14:56:13adminsetgithub: 42424
2009-04-05 14:27:10georg.brandlsetstatus: open -> closed
resolution: out of date
messages: + msg85499
2009-03-20 22:43:08ajaksu2setversions: + Python 2.6, - Python 2.4
nosy: + ajaksu2

messages: + msg83879

type: behavior
stage: needs patch
2005-09-28 04:49:37wigycreate