This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.dom.minidom parse bug
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: loewis, pmi
Priority: normal Keywords:

Created on 2007-01-05 16:37 by pmi, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
x.xmp pmi, 2007-01-05 16:37 xml file that makes the parser fail
Messages (2)
msg30926 - (view) Author: Pierre Imbaud (pmi) Date: 2007-01-05 16:37
xml.dom.minidom was unable to parse an xml file that came from an example provided by an official organism.(http://www.iptc.org/IPTC4XMP)
The parsed file was somewhat hairy, but I have been able to reproduce the bug with a simplified
version, attached. (ends with .xmp: its supposed
to be an xmp file, the xmp standard being built on
xml. Well, thats the short story).

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in  _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".
I digged a little further thru a pdb session, but the bug seems to be located in c code.
Thats the very first time I report a bug, chances are I provide too much or too little information...
To whoever it may concern, here is the invoking code:
from xml.dom import minidom
...
class xmp(dict):
    def __init__(self, inStream):
        xmldoc = minidom.parse(inStream)
        ....

x = xmp('/home/pierre/devt/port/IPTCCore-Full/x.xmp')


traceback:
/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xmpLib.py in __init__(self, inStream)
     26     def __init__(self, inStream):
     27         print minidom
---> 28         xmldoc = minidom.parse(inStream)
     29         xmpmeta = xmldoc.childNodes[1]
     30         rdf     = xmpmeta.childNodes[1]

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/nxml/dom/minidom.py in parse(file, parser, bufsize)

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in parse(file, namespaces)
    922         fp = open(file, 'rb')
    923         try:
--> 924             result = builder.parseFile(fp)
    925         finally:
    926             fp.close()

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in parseFile(self, file)
    205                 if not buffer:
    206                     break
--> 207                 parser.Parse(buffer, 0)
    208                 if first_buffer and self.document.documentElement:
    209                     self._setup_subset(buffer)

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in start_element_handler(self, name, attributes)
    743     def start_element_handler(self, name, attributes):
    744         if ' ' in name:
--> 745             uri, localname, prefix, qname = _parse_ns_name(self, name)
    746         else:
    747             uri = EMPTY_NAMESPACE
/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in _parse_ns_name(builder, name)
    125         localname = intern(localname, localname)
    126     else:
--> 127         uri, localname = parts
    128         prefix = EMPTY_PREFIX
    129         qname = localname = intern(localname, localname)

ValueError: too many values to unpack

The offending c statement:
/usr/src/packages/BUILD/Python-2.4/Modules/pyexpat.c(582)StartElement()
The returned 'name':
(Pdb) name
Out[5]: u'XMP Photographic Licensing Universal System (xmpPLUS, http://ns.adobe.com/xap/1.0/PLUS/) CreditLineReq xmpPLUS'
Its obvious the scanning went beyond the attribute.



msg30927 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-01-06 00:46
Dupe of 1627096
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44415
2007-01-05 16:37:19pmicreate