This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: normalize function in minidom unlinks empty child nodes
Type: Stage:
Components: XML Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: akuchling, maltehelmert, romankliotzkin
Priority: normal Keywords: easy

Created on 2006-02-17 16:52 by romankliotzkin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testcase.zip romankliotzkin, 2006-02-17 16:52 Test case - script and XML
MINIDOM-PATCH maltehelmert, 2008-02-01 22:35 patch for xml.dom.minidom.py
Messages (5)
msg27551 - (view) Author: RomanKliotzkin (romankliotzkin) Date: 2006-02-17 16:52
Hi,

When calling the "normalize" method of DOM objects in
the minidom library, the method will attempt to collate
text nodes together. When it encounters an empty text
node, it will do the following:

(line 197 in minidom.py)

  else:                                               
                                    
      # empty text node; discard                      
                                    
      child.unlink()

unlink() sets all the members of the child node to
None, BUT the parent node still has a link to the child
node. Therefore, if one is later iterating through the
parent node and calls removeChild on the unlinked node,
a NotFoundErr exception will be thrown by the library.

To trigger, use the test case script along with the
test case XML document included in the zip file.

testcase.py testcase.xml

My suggestion is to call self.removeChild(child), then
call child.unlink(). That way, there is no ambiguity
about an unlinked child being referenced by the parent.
msg61980 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-01 22:22
I can reproduce the bug on trunk (r60511). At first I thought the
behaviour might be caused by the testcase removing items from the
children list while iterating over it, but this is not the case; the
exception is raised upon the first removal already.

Here is a shorter testcase:
========================================
from xml.dom import minidom

def testme(xmltext):
    node = minidom.parseString(xmltext).documentElement
    for child in node.childNodes:
        if child.nodeValue == "t":
            child.nodeValue = ""
    node.normalize()

    child = node.firstChild
    while child:
        next = child.nextSibling
        if child.nodeType == child.TEXT_NODE and child.nodeValue == "":
            node.removeChild(child)
        child = next
    return node

print testme("<o><i/>tt</o>").toxml()
print testme("<o><i/>t</o>").toxml()
========================================

The second call to testme fails with an xml.dom.NotFoundErr, but it
should succeed and print "<o><i/></o>".


While this appears to be a genuine bug, I don't agree with the proposed
fix. I'm not sure if calling removeChild (which mutates self.childNodes)
from within normalize (which iterates over self.childNodes at that time)
is safe, and even if it is, it would make normalize O(N^2) (not taking
into account recursion) where it should be O(N).
msg61981 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-01 22:35
OK, I think I found the root cause. Node.normalize regenerates the list
of children L with their previousSibling/nextSibling references from
scratch; however, it fails to set the nextSibling reference for the very
last element of L to None at the end. This is necessary iff the last
node in the original child list is removed. The attached patch should
solve the problem.

In cases like this, should I also add a regression test?
msg61982 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-01 22:46
Here is a minimal testcase to more clearly expose the root of the
problem, in case a regression test is needed. Without the patch, the
assertion fails.

==========================================================
from xml.dom import minidom
node = minidom.parseString("<o><i/>t</o>").documentElement
node.childNodes[1].nodeValue = ""
node.normalize()
assert node.childNodes[-1].nextSibling == None
==========================================================
msg62778 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2008-02-23 17:22
Patch applied to 2.6 in rev. 60995, and to 2.5-maint in rev. 60998.
Thanks for reporting the problem, and for finding the bugfix!
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42916
2008-02-23 17:22:42akuchlingsetstatus: open -> closed
resolution: fixed
messages: + msg62778
nosy: + akuchling
2008-02-01 22:46:13maltehelmertsetmessages: + msg61982
2008-02-01 22:37:31maltehelmertsetversions: + Python 2.6, - Python 2.4
2008-02-01 22:35:13maltehelmertsetfiles: + MINIDOM-PATCH
messages: + msg61981
2008-02-01 22:22:16maltehelmertsetnosy: + maltehelmert
messages: + msg61980
2008-01-12 00:56:48akuchlingsetkeywords: + easy
2006-02-17 16:52:32romankliotzkincreate