This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pyexpat.c: Two line fix for decoding crash
Type: Stage:
Components: Extension Modules Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: nnorwitz, vulturex
Priority: normal Keywords: patch

Created on 2005-09-29 22:46 by vulturex, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pyexpat-fix.patch vulturex, 2005-09-29 22:46 Patch for Segmentation Fault
test.py vulturex, 2005-09-29 22:48 Sample python script which crashes Python 2.4, 2.4, and CVS
minidom-test.patch vulturex, 2005-09-30 00:25
Messages (3)
msg48803 - (view) Author: Evan Jones (vulturex) Date: 2005-09-29 22:46
The attached Python script "test.py" will crash Python version 2.3, 
2.4 and current CVS. The problem is that expat could pass back a 
string that is not in UTF8 format when the character encoding is not 
specified. In the example "test.py" the XML document is in latin_1 
format, but Python thinks it is in UTF-8 format.

The workaround is to decode the string into Unicode first, then 
encode it as UTF8. However, if this data was coming from a file or a 
user, it could crash the interpreter.

With the attached patch, instead of causing a segmentation fault it 
raises an exception, which is exactly what Python 2.2 does in this 
case:

Traceback (most recent call last):
  File "/home/ejones/test.py", line 5, in ?
    dom = xml.dom.minidom.parseString( x.encode( 'latin_1' ) )
  File "/home/ejones/python/dist/src/Lib/xml/dom/minidom.py", line 
1925, in parseString
    return expatbuilder.parseString(string)
  File "/home/ejones/python/dist/src/Lib/xml/dom/expatbuilder.py", 
line 940, in parseString
    return builder.parseString(string)
  File "/home/ejones/python/dist/src/Lib/xml/dom/expatbuilder.py", 
line 223, in parseString
    parser.Parse(string, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6: 
invalid data
msg48804 - (view) Author: Evan Jones (vulturex) Date: 2005-09-30 00:25
Logged In: YES 
user_id=539295

I've also attached a patch which adds this as a test case to 
test_minidom.py.
msg48805 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-09-30 04:58
Logged In: YES 
user_id=33168

Thanks!  

Note, I had to modify the patch a little bit because the
result of string_intern() was passed to Py_BuildValue(). 
Since string_intern() returned NULL, Py_BuildValue()
returned NULL and container wound up being Py_DECREF()ed
twice which showed up in a debug build.

Checking in Lib/test/test_minidom.py 1.42, 1.39.4.3
Checking in Misc/ACKS 1.297, 1.289.2.4
Checking in Misc/NEWS 1.1381, 1.1193.2.115
Checking in Modules/pyexpat.c 2.91, 2.89.2.2

History
Date User Action Args
2022-04-11 14:56:13adminsetgithub: 42433
2005-09-29 22:46:16vulturexcreate