This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Fix for segfault in ISO 2022 codecs
Type: Stage:
Components: Extension Modules Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: chasonr, hyeshik.chang, nnorwitz
Priority: normal Keywords: patch

Created on 2006-10-07 18:00 by chasonr, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
iso2022-patch.txt chasonr, 2006-10-07 18:07
Messages (5)
msg51211 - (view) Author: Ray Chason (chasonr) Date: 2006-10-07 18:00
This may relate to bug report 1005078, which was closed
because it couldn't be duplicated with the information
given.

Run the following program for a segmentation fault on
your Python interpreter:

--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT
HERE--CUT HERE--
import sys

for x in xrange(0x10000, 0x110000):
    if sys.maxunicode >= 0x10000:
        ch = unichr(x)
    else:
        ch = unichr(0xD7C0+(x>>10)) + unichr(0xDC00+(x
& 0x3FF))
    try:
        # Any ISO 2022 codec will cause the segfault
        ch.encode("iso_2022_jp")
    except UnicodeEncodeError:
        pass
--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT
HERE--CUT HERE--

I have verified this bug on four different Pythons:

* The current ActivePython (2.4.3 based), running on
Windows XP SP2
* The stock Python 2.4.2 on Ubuntu Breezy (i386)
* The stock Python 2.4.2 on Ubuntu Breezy (AMD64)
* A home-built Python 2.5 on Ubuntu Breezy (i386);
--enable-unicode=ucs4 is selected and other options are
left at default

It does not just affect iso_2022_jp, but all of the ISO
2002 codecs.

If you are attempting to replicate the bug on Linux,
you may get more repeatble results if you first go root
and then:

    echo 0 > /proc/sys/kernel/randomize_va_space

This seems related to bug report 1005078.  However, bug
report 1005078 claimed that a character in the BMP
could cause a crash.  I have not reproduced that bug
using a BMP character; however, supplementary
characters can in fact cause the ISO 2022 codecs to crash.

The problem is that four functions in
Modules/cjkcodecs/_codecs_iso2022.c do not check that
the code point is less than 0x10000 before invoking the
TRYMAP_ENC macro.  This causes the bounds of the
encoding table to be exceeded.   The four functions are:

* ksx1001_encoder
* jisx0208_encoder
* jisx0212_encoder
* gb2312_encoder

The enclosed patch adds the necessary checks, and the
above program then completes without incident.  It is
derived from the official 2.5 release, but also applies
cleanly against the daily drop of 6 October 2006
because the file Modules/cjkcodecs/_codecs_iso2022.c is
unchanged in that drop.
msg51212 - (view) Author: Ray Chason (chasonr) Date: 2006-10-07 18:07
Logged In: YES 
user_id=421946

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file. In addition, even if you
*did* check this checkbox, a bug in SourceForge
prevents attaching a file when *creating* an issue.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )
msg51213 - (view) Author: Ray Chason (chasonr) Date: 2006-10-07 18:07
Logged In: YES 
user_id=421946

The upload seems to have quietly failed to work.  Also, the
indents got mashed on that test program, and we all know how
important indents are to Python.

Here it is again, with the test program prefixed this time.
msg51214 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-10-08 00:53
Logged In: YES 
user_id=33168

Thanks for the report.

Perky, could you take a look at this patch?  I don't know if
it's correct or not.
msg51215 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-10-08 14:05
Logged In: YES 
user_id=55188

The patch is correct.  Thanks for the report!

Applied in svn:
r52223 for trunk
r52224 for 2.4
r52225 for 2.5
History
Date User Action Args
2022-04-11 14:56:20adminsetgithub: 44097
2006-10-07 18:00:20chasonrcreate