This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.chmod/os.utime/shutil do not work with unicode filenames
Type: Stage:
Components: Unicode Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: mhammond Nosy List: loewis, meyeet, mhammond, quiver, tim.peters
Priority: normal Keywords:

Created on 2003-11-20 21:27 by meyeet, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
q UŒ‹.txt meyeet, 2003-11-20 21:30 filename with kanji characters
unicode_filenames.patch mhammond, 2003-11-28 09:58 Patch, as discussed
Messages (13)
msg19050 - (view) Author: Eric Meyer (meyeet) Date: 2003-11-20 21:27
I have a filename that contains Kanji characters and I'm 
trying change the permissions on the file.

I am running Python 2.3.1 on Windows 2000.  Also I 
have the japanese language pack installed so that I can 
view the kanji characters in Windows explorer.


>>> part
u'\u5171\u6709\u3055\u308c\u308b.txt'
>>> os.chmod(part, 0777)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OSError: [Errno 22] Invalid argument: '?????.txt'
>>>

I attached the above named file for you to test against.

Thanks.
msg19051 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2003-11-21 00:07
Logged In: YES 
user_id=671362

I'm running Python in almost the same environment.

I guess this results from the different bihavior of u'' and 
unicode('').
If you convert a multi-byte character to a unicode 
character,
u'' and unicode('') don't return the same string.
unicode'' works as intended but u'' doesn't.
This is probably caused by the bug of Japanese codecs 
package.

Eric, please try the session below and tell me what 
happens.

NOTE: Japanese codecs needs to be installed to test the 
code below.
Otherwise, UnicodeDecodeError will be raised.
---

>>> import os
>>> os.listdir('.')
[]
>>> lst = ['\x82', '\xa0']   # japanese character
>>> u1 = unicode('\x82\xa0')
>>> u2 = u'\x82\xa0'
>>> u1 == u2
False
>>> u1, u2
(u'\u3042', u'\x82\xa0')  # u2 is odd
>>> print >> file(u1, 'w'), "hello world"
>>> os.listdir('.')
['B']
>>> os.chmod(u1, 0777)
>>> os.chmod(u2, 0777)

Traceback (most recent call last):
  File "<pyshell#179>", line 1, in -toplevel-
    os.chmod(u2, 0777)
OSError: [Errno 22] Invalid argument: '??'
msg19052 - (view) Author: Eric Meyer (meyeet) Date: 2003-11-21 16:18
Logged In: YES 
user_id=913976

George,

I tried the following but I had to specify one of the japanese 
codecs during the unicode() call.  What is your default 
encoding set to?  Below are my results. 

>>> import os
>>> os.listdir('.')
[]
>>> u1 = unicode('\x82\xa0', 'cp932')
>>> u2 = u'\x82\xa0'
>>> u1, u2
(u'\u3042', u'\x82\xa0')
>>> print >> file(u1, 'w'), "hello world"
>>> os.listdir('.')
['?']
>>> os.chmod(u1, 0777)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OSError: [Errno 22] Invalid argument: '?'
msg19053 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2003-11-22 00:51
Logged In: YES 
user_id=671362

Hi, Eric.

My previous post was maybe wrong.
This is the problem of os.chmod.

I've confirmed two kinds of exceptions are raised when 
using os.chmod for unicode filenames.

The first one is [Errno 22] Invalid argument.
You can read/write a file but cannot use os.chmod.

The second one is [Errno 2] No such file or directory.
Although there exists a file, Python complains "No such 
file or directory"

test.test_codecs has a bunch of international unicode 
characters, so I borrowed them for testing.

>>> import os
>>> from test.test_codecs import punycode_testcases
>>> def unicode_test(name):
    try:
        f = file(name, 'w')
        f.close()
    except IOError, e:
        print e
        return
    try:
        os.chmod(name, 0777)
    except OSError, e:
        print e

        
>>> for i, (uni, puny) in enumerate
(punycode_testcases):
    print i
    unicode_test(uni)


I ran this script on Windows 2000(Japanese edition) 
using Python 2.3 and got "[Errno 22]" for 
0,1,2,3,4,5,7,10 and "[Errno 2]" for 9.
msg19054 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-11-24 22:21
Logged In: YES 
user_id=21627

If you look at the source of os.chmod, it is not at all
surprising that it does not work for characters outside the
file system encoding: it is simply not implemented. Patches
are welcome.
msg19055 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2003-11-28 09:58
Logged In: YES 
user_id=14198

I opened http://www.python.org/sf/846133 regarding os.utime,
which I found via the "shutil" module, via SpamBayes, also
on a Japanese system (see that bug for details), but then I
saw this and decided to tackle them both.

I rolled my fix for that in with a fix for chmod.  I also
hacked the test suite radically:
* Creation of a test_support.TESTFN_UNICODE_UNENCODEABLE
variable, which is a Unicode string that can *not* be
encoded using the file system encoding.  This will cause
functions with 'encoding' support but without Unicode
support (such as utime/chmod) to fail.
* Made functions of all the test cases, so more combinations
of unicode/encoded can be tested.  Many are redundant, but
that is OK.
* Added shutil tests of the filenames
* While I was there, converted to a unittest test.

The new test case blows up with a couple of errors before
the posixmodule patch is applied, and passes after.

Note that shutil.move/copy etc can not handle being passed
one string and one unicode arg, and therefore this
combination is skipped.   I'd like any opinions on whether
this is a bug in shutil or not.

Also note that the new comment in test_support.py regarding
a potential bug in the 'mbcs' encoding - it appears as if it
always works as though errors=ignore.

Comments/reviews?
msg19056 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2003-11-29 01:32
Logged In: YES 
user_id=14198

I created www.python.org/sf/850997 about the MBCS encoding
issue.
msg19057 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-12-01 21:39
Logged In: YES 
user_id=21627

The patches to posixmodule.c are fine for both 2.3 and 2.4.
Can you apply them before 2.3.3 is frozen?

The patches to the test suite are fine for 2.4 only, and
they probably need to be relaxed. For example, on OSX, there
simply is no file name that fails to work for the normal
file system API: the file system encoding is UTF-8, so it
supports all file names. You should consider changing
test_pep277.py instead.
msg19058 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2003-12-03 01:33
Logged In: YES 
user_id=14198

release23-maint:
Checking in posixmodule.c;
new revision: 2.300.8.5; previous revision: 2.300.8.4

trunk:
Checking in posixmodule.c;
new revision: 2.309; previous revision: 2.308
Checking in test_support.py;
new revision: 1.59; previous revision: 1.58
Checking in test_unicode_file.py;
new revision: 1.11; previous revision: 1.10
Removing output/test_unicode_file;
new revision: delete; previous revision: 1.1

msg19059 - (view) Author: Eric Meyer (meyeet) Date: 2003-12-03 19:16
Logged In: YES 
user_id=913976

Is there an approximate date (or month) when 2.3.3 is likely 
to be released?
msg19060 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-12-03 19:21
Logged In: YES 
user_id=31435

meyeet, 2.3.3 should be released this month (December).

Mark, I reopened this, because test_unicode_filename fails on 
Win98SE now (see Python-Dev report; that was on the trunk; 
I don't know about 2.3 maint).
msg19061 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-12-04 07:18
Logged In: YES 
user_id=21627

2.3 maint should be fine: the problems are more likely in
the new test cases than in the code itself.
msg19062 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2004-05-05 12:26
Logged In: YES 
user_id=14198

I'm fairly sure this has been nailed (including the test
failure) for some time?
History
Date User Action Args
2022-04-11 14:56:01adminsetgithub: 39572
2003-11-20 21:27:12meyeetcreate