This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Convertion error for latin1 characters with iso-2022-jp-2
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: duranlef, hyeshik.chang
Priority: normal Keywords:

Created on 2006-03-12 21:57 by duranlef, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.iso-2022-jp-2 duranlef, 2006-03-12 21:57
Messages (2)
msg27762 - (view) Author: Francois Duranleau (duranlef) Date: 2006-03-12 21:57
It seems like there are some errors while reading a
text file encoded with ISO-2022-JP-2 using the codecs
module. In all my test cases, all latin1 characters
with an accent (e.g. e acute) do not appear in the
output string. However, if I convert the file manually
using iconv, I get everything right. Here is a simple
script that will illustrate the problem:

###########################################

import codecs

import pygtk
import gtk

f = codecs.open( "test.iso-2022-jp-2" , "r" , \
                 "iso-2022-jp-2" )
s1 = f.readline().strip()
f.close()

f = open( "test.utf-8" , "r" )
s2 = f.readline().strip()

pack = gtk.VBox()
pack.pack_start( gtk.Label( s1 ) )
pack.pack_start( gtk.Label( s2 ) )

window = gtk.Window( gtk.WINDOW_TOPLEVEL )
window.add( pack )
window.show_all()

def event_destroy( widget , event , data ) :
    gtk.main_quit()
    return 0

window.connect( "delete_event" , \
                lambda w,e,d: False , None )
window.connect( "destroy" , event_destroy , None )

gtk.main()

###########################################

I put the file "test.iso-2022-jp-2" in attachment. To
create the UTF-8 version of the file, I used the
following shell command:

iconv -f ISO-2022-JP-2 -t UTF-8 \
    test.iso-2022-jp-2 > test.utf-8

When running this script, I would actually expect a
window with two times the same label. However, the
first one is missing the e acute.

--
Francois
msg27763 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-03-13 10:27
Logged In: YES 
user_id=55188

Fixed in SVN (trunk:r42989, release24-maint:42991).
Thank you for the report!
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 43025
2006-03-12 21:57:32duranlefcreate