This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Problems with Tcl/Tk and non-ASCII text entry
Type: Stage:
Components: Unicode Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: loewis Nosy List: THRlWiTi, gvanrossum, jhylton, kirill_simonov, lemburg, loewis
Priority: low Keywords:

Created on 2000-10-31 21:38 by kirill_simonov, last changed 2022-04-10 16:02 by admin. This issue is now closed.

Messages (13)
msg2250 - (view) Author: Kirill Simonov (kirill_simonov) Date: 2000-10-31 21:38
Win98, Python2.0final.

1. I can't write cyrillic letters in IDLE editor.

I tried to figure, what's happened and found that
Tcl has command 'encoding'. I typed in IDLE shell:

>>> from Tkinter import *
>>> root = Tk()
>>> root.tk.call("encoding", "names")
'utf-8 identity unicode'
>>> root.tk.call("encoding", "system")
'identity'

But Tcl had numerous encodings in 'tcl\tcl8.3\encodings'
including 'cp1251'!

Then I installed Tk separately and removed tcl83.dll
and tk83.dll from DLLs:

>>> from Tkinter import *
>>> root = Tk()
>>> root.tk.call("encoding", "names")
'cp860 cp861 [.........] cp857 unicode'
>>> root.tk.call("encoding", "system")
'cp1251'

So, when tcl/tk dlls in Python\DLLs directory,
TCL can't load all it's encodings.

But this is not the end.

I typed in IDLE shell:

>>> print "hello <in russian>" # all chars looks correctly.
and got:
Exception in Tkinter callback
Traceback (most recent call last):
  File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__
    return apply(self.func, args)
  File "C:\PYTHON20\Tools\idle\PyShell.py", line 579, in enter_callback
    self.runit()
  File "C:\PYTHON20\Tools\idle\PyShell.py", line 598, in runit
    more = self.interp.runsource(line)
  File "C:\PYTHON20\Tools\idle\PyShell.py", line 183, in runsource
    return InteractiveInterpreter.runsource(self, source, filename)
  File "c:\python20\lib\code.py", line 61, in runsource
    code = compile_command(source, filename, symbol)
  File "c:\python20\lib\codeop.py", line 61, in compile_command
    code = compile(source, filename, symbol)
UnicodeError: ASCII encoding error: ordinal not in range(128)
print "[the same characters]"
Then, when I pressed Enter again, i got the same
error message. I stopped this by pressing C-Break.

[1/2 hour later]
I fix this by editing site.py:
if 1: # was: if 0
  # Enable to support locale aware default string encodings.

I typed again:
>>> print "hello <in russian>"
and got:
<some strange letters>
>>> print unicode("hello <in russian>")
<some strange letters>

[2 hours later]
Looking sources of _tkinter.c:

static Tcl_Obj* AsObj(PyObject *value)
{
    if type(value) is StringType:
        return Tcl_NewStringObj(value)
    elif type(value) is UnicodeType:
        ...
...
}

But I read in
<http://dev.scriptics.com/doc/howto/i18n.html>
that all Tcl functions require all strings to
be passed in UTF-8. So, this code must look like:

    if type(value) is StringType:
        if TCL_Version >= 8.1:
             return Tcl_NewStringObj(<value converted
           to UTF-8 string using sys.getdefaultencoding()>)
        else:
             return Tcl_NewStringObj(value)

And when I typed:
>>> print unicode("hello <in russian>").encode('utf-8')
i got:
hello <in russian>

This is the end.

P.S. Sorry for my bad english, but I really want to
use IDLE and Tkinter in our school, so I can't wait
for somebody other writing bug report.
msg2251 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2000-11-01 16:00
I am not entirely sure what the bug is, though I agree that it can be confusing to deal with Unicode strings.
msg2252 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2000-11-01 20:47
AFAIK, the _tkinter.c code automatically converts Unicode
to UTF-8 and then passes this to Tcl/Tk.

So basically the folloing should get you correct results...

print unicode("hello <in russian>", "cp1251")

Alternatively, you can set your default encoding to "cp1251"
in the way your describe and then write:

print unicode("hello <in russian>")

I am not too familiar with Tcl/Tk, so I can't judge whether trying
to recode normal 8-bit into UTF-8 is a good idea in general
for the _tkinter.c interface. It would easily be possible using:

utf8 = string.encode('utf-8')

since 8-bit support the .encode() method too.
msg2253 - (view) Author: Kirill Simonov (kirill_simonov) Date: 2000-11-01 21:16
1. print unicode("<cyrillic>") in IDLE don't work!
The mechanics (I think) is
a) print unicode_string encodes unicode string to
normal string using default encoding and pass it
to sys.stdout.
b) sys.stdout intercepted by IDLE. IDLE sent this string
to Tkinter.
c) Tkinter pass this string (not unicode but cp1251!)
to TCL but TCL waits for UTF-8 string!!!
d) I see messy characters on screen.
2. You breaks compability! In 1.5 I can write
Button(root, text="<cyrillic>") and this works.
Writing unicode("<>", 'cp1251') is UGLY and ANNOYING!
TCL requires string in utf-8. All pythonian strings
is sys.getdefaultencoding() encoding. So, we have to
recode all strings to utf-8.
3. TCL in DLLs can't found it's encodings in
tcl\tk8.3\encodings! I don't no why. So, I can't write
in Tkinter.Text in russian.
msg2254 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2000-11-03 20:49
Assigned to Marc-Andre, since I have no idea what to do about this... :-(
msg2255 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2000-11-09 10:00
Ok, as we've found out in discussions on python-dev, the cause for the
problem is (partially) the fact that "print obj" does an implicit str(obj), so any
Unicode object printed will turn out as default encoded string no matter
how hard we try.

To fix this, we'll need to tweak the current "print" mechanism a bit to allow
Unicode to pass through to the receveiving end (sys.stdout in this case).

About the problem that Tcl/tk needs UTF-8 strings: we could have _tkinter.c
recode the strings for you in case sys.getdefaultencoding() returns anything
other than 'ascii' or 'utf-8'. That way you can use a different default encoding
in Python while Tcl/tk will always get a true UTF-8 string.

Would this be a solution ?
msg2256 - (view) Author: Kirill Simonov (kirill_simonov) Date: 2000-11-10 18:53
Yes, this is a solution. But don't forget that
TCL can't load it's encodings at startup.

Look at FixTk.py:

import sys, os, _tkinter
[...]
os.environ["TCL_LIBRARY"] = v

But 'import _tkinter' loads _tkinter.pyd; _tkinter.pyd
loads tcl83.dll; tcl83.dll tryes to load it's encodings
at startup and fails, becourse TCL_LIBRARY is not defined!

I can fix this:
#import sys, os, _tkinter
import sys, os
#ver = str(_tkinter.TCL_VERSION)
ver = "8.3"
[...]
msg2257 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2000-11-12 11:30
It should be no problem that Tcl can't find its encodings. When used with Tkinter, Tcl can only expect Unicode strings, or strings in sys.getdefaultencoding() (i.e. 'ascii'). Therefore, Tk never needs any other encoding.

If you want to make use of the Tcl system encoding (which is apparently not supported in Tkinter), you probably need to set the TCL_LIBRARY environment variable.
msg2258 - (view) Author: Kirill Simonov (kirill_simonov) Date: 2000-11-12 12:17
No, you are wrong! Entry and Text widget depends on TCL system encoding. If TCL can't find cyrillic encoding (cp1251) then I can't enter cyrillic characters.
msg2259 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-01-03 21:37
I've changed the subject line to better reflect the cause of the error:

1. The Tcl version shipped with Python 2.0 apparently 
doesn't include the Tcl codec libs, but these seem to be needed by
Tcl to allow entry of characters in non-ASCII environments.

2. Python's print statement should allow Unicode to be passed through
to sys.stdout.

3. _tkinter should recode all 8-bit strings into Unicode under the assumption
that the 8-bit strings use sys.getdefaultencoding() as encoding.
msg2260 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-02-24 17:07
Logged In: YES 
user_id=21627

Item 1. of MAL's list becomes 'Tcl does not find its
encoding directory' in Python 2.2; this is fixed with
FixTk.py 1.6.

Item 2. has been fixed for Python 2.2; the remaining problem
was that the OutputWindow converted all unicode objects to
strings first, this has been fixed with OutputWindow.py 1.6.

I'm not sure which problem is supposed to be solved with
item 3. in MAL's list, I believe that this change is not
necessary, and may be incorrect in some cases.

Item 1. of the original submitter's problems is solved with
the changes to FixTk.py.

As for entering non-ASCII characters in the IDLE shell, I'm
not sure what to do with this. For entering non-ASCII
characters in a IDLE source window, see patch

http://sourceforge.net/tracker/index.php?func=detail&aid=508973&group_id=9579&atid=309579
and PEP 263.

I'm inclined to recommend that IDLE should encode Unicode
strings entered by the user as UTF-8 before passing them to
the interpreter; most likely, any byte strings will be
printed to a Tk window, in which case UTF-8 should work right.


msg2261 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2002-02-24 17:31
Logged In: YES 
user_id=38388

Assigned to Martin for further processing -- I know
to little about Tkinter to be of any help here.
msg2262 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-08-05 15:25
Logged In: YES 
user_id=21627

The IDLE problems have been fixed in CVS: IDLE now
implements PEP 263, and uses the locale's encoding for
interactive input.
History
Date User Action Args
2022-04-10 16:02:33adminsetgithub: 33425
2015-06-08 10:43:29THRlWiTisetnosy: + THRlWiTi
2000-10-31 21:38:56kirill_simonovcreate