This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: File read of Chinese utf-16-le treats upper byte 1A as EOF
Type: Stage:
Components: Unicode Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, nnorwitz, rrother
Priority: normal Keywords:

Created on 2004-02-25 19:30 by rrother, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg20133 - (view) Author: Ron Rother (rrother) Date: 2004-02-25 19:30
Any utf-16-le Chinese character with 1A as the most 
significant byte causes remainder of file to be ignored.

code extract:

(utf16_encoder, utf16_decoder, utf16_reader, 
utf16_writer) = codecs.lookup("utf-16-le")

ifile = utf16_reader(open(sys.argv[1],"r"))

t=ifile.read()

When the Chinese character 1A 5C () is encoundered, 
everthing from the 5C is discarded.

These 3 lines:
English="You have not selected any books!"
Context=1,[MsgBox "You have not selected any books!"]
Chinese(Simplified)="*	éûUfw"

are input as:
English="You have not selected any books!"
Context=1,[MsgBox "You have not selected any books!"]
Chinese(Simplified)="
msg20134 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-02-25 22:53
Logged In: YES 
user_id=38388

I believe there is a misconception here: the open(..., "r")
will cause the file to be opened in C lib's text mode. Since
UTF-16 is binary data, this will lead to problems with line
breaking
and file handling in general.

You should try:

import codecs
ifile = codecs.open(filename, 'rb', encoding='utf-16-le')
msg20135 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-03 01:19
Logged In: YES 
user_id=33168

MAL, this seems to come up from time to time.  Perhaps we
should update the doc for open()?  If it's already
documented, could we make it clearer?  Then we should be
able to close this bug.  I think I saw another bug recently
that was similar to this one.
History
Date User Action Args
2022-04-11 14:56:02adminsetgithub: 39981
2004-02-25 19:30:42rrothercreate