This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: reading from malformed big5 document hangs cpython
Type: Stage:
Components: Interpreter Core Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: hyeshik.chang, nnorwitz, tsuraan3
Priority: high Keywords:

Created on 2007-05-30 15:36 by tsuraan3, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg32147 - (view) Author: tsuraan (tsuraan3) Date: 2007-05-30 15:36
Python enters some sort of infinite loop when attempting to read data from a malformed file that is big5 encoded (using the codecs library).  This behaviour can be observed under Linux and FreeBSD, using Python 2.4 and 2.5 .  A really simple example illustrating the bug follows:

Python 2.4.4 (#1, May 15 2007, 13:33:55)
[GCC 4.1.1 (Gentoo 4.1.1-r3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> fname='out'      
>>> outfd=open(fname,'w')
>>> outfd.write(chr(243))
>>> outfd.close()
>>>
>>> infd= codecs.open(fname, encoding='big5')
>>> infd.read(1024)

And then, it hangs forever.  If I instead use the following code:

Python 2.5 (r25:51908, Jan  8 2007, 19:09:28)
[GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs, signal
>>> fname='out'
>>> def handler(*args):
...   raise Exception("boo!")
...
>>> signal.signal(signal.SIGALRM, handler)
0
>>> outfd=open(fname, 'w')
>>> outfd.write (chr(243))
>>> outfd.close()
>>>
>>> infd=codecs.open(fname, encoding='big5')
>>> signal.alarm(5)
0
>>> infd.read(1024)

The program still hangs forever.  The program can be made to crash if I don't install a signal handler at all, but that's pretty lame.  It looks like the entire interpreter is being locked up by this read, so I don't think there's likely to be a pure-python workaround, but I thought it would be a good but to have out there so a future version of python can (hopefully) fix this. 
msg32148 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-05-31 04:49
Hye-Shik, could you take a look at this.  There's an infinite loop in Modules/cjkcodecs/multibytecodec.c mbstreamreader_iread().  rsize == 1 each iteration.  I don't know if there are more places that might have this problem.
msg32149 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-05-31 04:51
Bumping the priority since this is about as bad as a crash.
msg32150 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2007-06-05 19:31
Thank you for the reporting, tsuraan, and thank you for the investigation, Neal.

The bug is related to a logic that detects whether file reached end of file.  I verified that any other part of CJKCodecs has such a logic.
Fixed and committed in SVN.
trunk 55770, release25-maint 55774, release24-maint 55772.
msg32151 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2007-06-05 19:34
in my comment:
> The bug is related to a logic that detects whether file reached end of
file.  I verified that any other part of CJKCodecs has such a logic.

I meant "no part". sorry. :)
History
Date User Action Args
2022-04-11 14:56:24adminsetgithub: 45015
2007-05-30 15:36:03tsuraan3create