This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: EditorWindow's title with non-ASCII chars.
Type: Stage:
Components: IDLE Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: kbk, loewis, suzuki_hisao
Priority: normal Keywords: patch

Created on 2005-03-14 08:19 by suzuki_hisao, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
EditorWindow.py.diff.zip suzuki_hisao, 2005-03-14 08:19 a diff file and two screen shots
EditorWindow.py.diff suzuki_hisao, 2005-03-17 04:39 revised
idlelib.diff suzuki_hisao, 2005-03-17 08:40 revised
revised-idlelib.diff suzuki_hisao, 2005-03-22 07:00 revised
Messages (15)
msg47960 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-14 08:19
This small patch makes it possible to display a path including non-
ASCII chars as the title of Editor Window.  See the screen shots of 
original IDLE and patched one.
msg47961 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-03-15 23:04
Logged In: YES 
user_id=21627

I think the patch is wrong/not general enough:
- if decoding fails for some reason, it should continue anyway
- I'm not sure where the title comes from, but it might be
that it is a file name. If so, it should use
sys.getfilesystemencoding() instead of IOBinding.encoding.
This matters only on systems where these might differ, e.g.
MacOSX.
msg47962 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-17 04:39
Logged In: YES 
user_id=495142

Thank you for your comment.

First, indeed some titles may fail to be decoded, but it will be sufficient to 
use 'ignore' as the error handling scheme.  At least it gives more readable 
titles than the present "mojibake'd" ones.

Second, the title comes from either sys.argv or tkFileDialog.  tkFileDialog 
calls tk_getOpenFile and tk_getSaveFile of Tck/Tk.  So you are right.  It 
would be better to use sys.getfilesystemencoding().  Note that the patch 
does not affect any unicode titles.

As for OSX, it seems that tk_getOpenFile sometimes returns a broken 
string unless you set LANG so as to use UTF-8 (en_US.UTF-8, 
ja_JP.UTF-8 etc.).  You can see it as follows:

$ LANG=ja_JP.SJIS wish8.4
% tk_getOpenFile

For a folder name of Japanese characters, you will get a broken result; it is 
neither UTF-8 nor SJIS.  The same problem applies to eucJP.  It is a bug 
of Tcl/Tk (I found it in Aqua Tcl/Tk 8.4.9) and affects the original IDLE, too.

All in all, it would be the most reasonable to use 
sys.getfilesystemencoding() and 'ignore' scheme for now.
msg47963 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-03-17 08:00
Logged In: YES 
user_id=21627

Hmm. When the string comes from sys.argv, it should be in
the user's preferred encoding, not in the file system
encoding, which would suggest that the current code is right.

When the string comes from tk_getOpenFile, I would expect to
get a Unicode string back instead of a byte string. I can
believe that Tk fails for OSX here: it relies on Tcl's glob
command, which apparently assumes that "encoding system" is
used for the file system; try

 >>> [unicode(x) for x in t.tk.call(('glob','*'))]

There are more issues OSX glob, e.g. for Latin characters,
it processes the decomposed form inconveniently, see

http://sourceforge.net/tracker/?func=detail&aid=823330&group_id=10894&atid=110894

So I think it is fine to display question marks on OSX if
necessary;in general, it now seems that the locale's
encoding should be used indeed.
msg47964 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-17 08:40
Logged In: YES 
user_id=495142

I'm sorry, but the previous patches are insufficient to handle non-ASCII file 
names.
The menu "Recent Files" in "File" in the menu-bar does not display such 
names correctly.
In addition, when updating the "Recent Files" menu, UnicodeDecodeError 
raises in _implicit_ conversion of unicode filename given by tkFileDialog to 
ASCII string.

So I made a new patch.  Do not use the previous patches, please.
The new patch converts every multi-byte file name into unicode early in 
IOBinding; thus the file path is correctly displayed in the title bar.  
And it converts every unicode name into multi-byte string explicitly when 
updating the menu.
Note that IDLE writes the recent file names as a text file.  Conversion into 
string is necessary anyway.
msg47965 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-17 09:25
Logged In: YES 
user_id=495142

In the typical usage of IDLE, sys.argv[] are given to "pythonw" command
by the window system.  Thus, in almost all cases, they are in the
filesystem encoding.

I believe IDLE with the last patch will run well on OS X (as well as on
Windows etc.) if the Tcl/Tk bug of OS X is fixed someday, or the
environment variable LANG is set to use UTF-8 for now.
msg47966 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-03-17 21:33
Logged In: YES 
user_id=21627

On what operating system is sys.argv in file system encoding
(i.e. in the encoding that open(2) expects), and not in the
locale's encoding? AFAIK, both Linux and Windows use the
locale's encoding for sys.argv (but then, they also use the
same encoding for the file system).
msg47967 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-18 02:47
Logged In: YES 
user_id=495142

When you install Python in Windows, you will get 
"Edit with IDLE" entry in the context menu for *.py file.
The entry launches the pythonw.exe with the name of
*.py file as one of the sys.argv[] parameters.

See the registry: 
HKEY_CLASSES_ROOT\Python.File\shell\Edit with IDLE\command

There you will see:
"C:\Python24\pythonw.exe" "C:\Python24\Lib\idlelib\idle.pyw"
-n -e "%1"

I thought the file name given here as "%1" would be what
open(2) accepts.
msg47968 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-03-18 06:49
Logged In: YES 
user_id=21627

On Windows, both argv and file names are encoded as "mbcs",
which is both the locale's encoding, and the file system
encoding. The interesting question is: how are command line
arguments encoded on OSX (which is the only system which has
a file system encoding independent of the locale)?
msg47969 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-18 11:08
Logged In: YES 
user_id=495142

On OS X, any command line which you type in is encoded
to what Terminal.app specifies.

If it is other than UTF-8, you will see broken display for
non-ASCII file names when listing them in Terminal.app.

If does not match with LANG, line editing in bash will be
somewhat useless for multi-byte characters.

In Japan, seasoned Unix users tend to use EUC-JP on
OS X, and they also tend to restrict their file names to
ASCII.  They use EUC-JP in their program sources,
LaTeX files, command messages, etc.  It has been a
long tradition how you use Unix in Japan since circa
1990.  Thus the broken display for non-ASCII file names
does not bother them.

Some newfangled Unix users use UTF-8 characters in
command line on OS X.  And many other OS X users,
who use national characters for their file names natullay,
do not use command line at all.

In theory, you can use some non-UTF8 encoding for a
non-ASCII file name in your command line.  However,
in practice for now, it seems very unlikely on OS X.
msg47970 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-03-18 18:16
Logged In: YES 
user_id=21627

Yes, IMO it is really said that there is no programmatic way
to determine the encoding of Terminal.app (an ioctl would be
nice).
msg47971 - (view) Author: Kurt B. Kaiser (kbk) * (Python committer) Date: 2005-03-19 04:05
Logged In: YES 
user_id=149084

I'm monitoring :-)
Martin knows far more than I do about these things.
msg47972 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-19 08:19
Logged In: YES 
user_id=495142

I was afraid you might not notice that the patch has
to do with freezing of IDLE.  Indeed my IDLE on
Windows XP has been very freeze-free since applying it.
msg47973 - (view) Author: SUZUKI Hisao (suzuki_hisao) Date: 2005-03-22 12:33
Logged In: YES 
user_id=495142

I have revised the patch so that any
unicode results from tkFileDialog are
always converted to strings.  Now it is
more conservative in that unicodes are
used minimally for file names.
msg47974 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-11-27 16:59
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as r41551.
History
Date User Action Args
2022-04-11 14:56:10adminsetgithub: 41693
2005-03-14 08:19:20suzuki_hisaocreate