This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode in sys.path not supported
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: doerwalter Nosy List: doerwalter, lemburg, loewis, pboddie
Priority: low Keywords:

Created on 2001-10-30 11:25 by pboddie, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Messages (11)
msg7252 - (view) Author: Paul Boddie (pboddie) Date: 2001-10-30 11:25
When a Unicode string is passed as the module name to 
imp.find_module, the function fails to import the 
named module even when it exists in the specified 
path, returning the error message "No module 
named ..." as a result.

The problem in Python 2.0 can be traced to line 922 of 
Python/import.c which ensures that any strings 
involved in the find_module function must be standard 
Python strings and not Unicode strings, since it tests 
the type of path components against &PyString_Type 
explicitly.

Interestingly, the __import__ built-in function seems 
to work with Unicode strings. Either way, it would be 
great if this could be documented or even fixed, but I 
don't know what the policy is on Unicode module names 
(even when they only contain ASCII-compatible 
characters).
msg7253 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-12-01 23:01
Logged In: YES 
user_id=38388

I guess Python should not except non-ASCII module names, so conversion of Unicode to ASCII should be 
appropriate.

Would it suffice to only test this in find_module() or do you think that I need to dig deeper into the import 
mechanism ?
msg7254 - (view) Author: Paul Boddie (pboddie) Date: 2001-12-03 10:59
Logged In: YES 
user_id=226443

For my purposes, I just wrapped the module name in a 'str' 
function call. I had Unicode strings because I was using 
text from an XML document and then attempting to use such 
text with the import mechanism.

One issue is whether Python would ever support importing 
from files which have non-ASCII filenames. I can imagine 
that certain operating systems support Unicode filenames, 
for example, but then the Python language probably doesn't 
support such filenames as the basis for module names when 
used with the 'import' statement and other related 
statements.

So, there's a wider issue of text encodings in (C)Python 
scripts as part of the "comprehensive" solution to this 
problem; the easy solution is just to enforce ASCII-only 
module names.
msg7255 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-01-05 08:04
Logged In: YES 
user_id=21627

I cannot reproduce the problem in Python 2.1:

>>> import imp
>>> imp.find_module(u"string")
(<open file '/usr/local/lib/python2.2/string.py', mode 'r'
at 0x816e070>, '/usr/local/lib/python2.2/string.py', ('.py',
'r', 1))

I don't think __import__ should accept non-ASCII names. It
may be reasonable to further restrict import to verify that
the argument is a NAME, in the sense of the Python lexis;
doing so is not important, either.
I cannot see any further problem in this report, so I
suggest to close it as fixed. The test in line 922 only
checks the path, not the module name.
msg7256 - (view) Author: Paul Boddie (pboddie) Date: 2002-01-07 10:43
Logged In: YES 
user_id=226443

It must have been fixed between Python 2.0 and Python 2.1, 
then, but I can't find any obvious indication of this in 
Python/import.c. The platform probably shouldn't matter in 
this case, but I was using Red Hat Linux 6.1 on Intel.
msg7257 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-01-07 10:55
Logged In: YES 
user_id=38388

The find_module() code doesn't seem to have changed between 
the releases, so it should work in Python 2.0 as well.

The only parts I see in the source code which require strings 
are the sys.path handling APIs. The optional second argument
to find_module() will also only accept strings. Perhaps that's where
your problem originated ?

Python 2.0 (#1, Jan 19 2001, 17:54:27)
[GCC 2.95.2 19991024 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import imp
>>> imp.find_module(u'platform')
(<open file '/home/lemburg/bin/platform.py', mode 'r' at 0x8191a78>, '/home/lemburg/bin/platform.py', ('.py', 'r', 1))
>

Can you give an example which demonstrates the problem ?
msg7258 - (view) Author: Paul Boddie (pboddie) Date: 2002-01-07 13:09
Logged In: YES 
user_id=226443

My apologies: I should have been clearer in my description. 
Here's a test case for Python 2.1 on Windows which 
demonstrates the problem:

import sys, imp

ascii_dir = "D:\\Private\\Vaults"
unicode_dir = u"D:\\Private\\Vaults"

# First test: Unicode sys.path value.

sys.path.append(unicode_dir)
imp.find_module(u"VaultsSearch") # fails
imp.find_module("VaultsSearch") # fails
sys.path.remove(unicode_dir)

# Second test: ASCII sys.path value.

sys.path.append(ascii_dir)
imp.find_module(u"VaultsSearch") # succeeds
imp.find_module("VaultsSearch") # succeeds
sys.path.remove(ascii_dir)
msg7259 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2002-09-04 19:03
Logged In: YES 
user_id=89016

import.c 2.207 should have fixed this problem, so I hope we 
can close this bug now.
msg7260 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-09-05 07:26
Logged In: YES 
user_id=38388

No time to check; can you do this, Walter ?
Thanks.
msg7261 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2002-09-05 15:45
Logged In: YES 
user_id=89016

It seems to work under Linux, can anyone check it under 
Windows with non-ascii directory names?

> echo >/tmp/foo.py "print 'foo'"
> ./python
Python 2.3a0 (#12, Sep  4 2002, 22:04:22) 
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import sys, imp
>>> sys.path.append(u"/tmp")
>>> imp.find_module("foo")
(<open file '/tmp/foo.py', mode 'U' at 
0x40078448>, '/tmp/foo.py', ('.py', 'U', 1))
msg7262 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2002-09-05 17:22
Logged In: YES 
user_id=89016

It works on Windows 2000 too:

Python 2.3a0 (#29, Sep  5 2002, 18:43:40) [MSC 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more
information.
>>> import sys, os, imp
[8623 refs]
>>> os.mkdir(u"c:\\n\xfcx")
[10435 refs]
>>> sys.path.append(u"c:\\n\xfcx")
[10436 refs]
>>> open(u"c:\\n\xfcx\\hurz.py", "wb").write('print "hurz"')
[10567 refs]
>>> imp.find_module(u"hurz")
(<open file 'c:\n³x\hurz.py', mode 'U' at 0x007DC488>,
'c:\\n\xfcx\\hurz.py', ('.py', 'U', 1))
[10580 refs]
>>>

The repr() of the file seems a little strange, but looking
in the Explorer I see the correct directory name.

Using a Unicode character that is outside the Latin-1 range
instead \xfc fails in the os.mkdir() call, because this is a
problem of the "mbcs" encoding, which returns ?, which is
illegal in directory names.

On Linux it works with non-ascii directory names too, if the
appropriate locale.setlocale is called:

>>> import os, sys, imp, locale
>>> locale.setlocale(locale.LC_ALL, 'de_DE')             
>>> os.mkdir(u"/tmp/gürk")
>>> open(u"/tmp/gürk/hurz.py", "wb").write("print 'hurz'")
>>> sys.path.append(u"/tmp/gürk")
>>> imp.find_module(u"hurz")
(<open file '/tmp/gürk/hurz.py', mode 'U' at 0x400ce9f8>,
'/tmp/g\xfcrk/hurz.py', ('.py', 'U', 1))
History
Date User Action Args
2022-04-10 16:04:35adminsetgithub: 35426
2001-10-30 11:25:35pboddiecreate