Issue 940263: xml parser bug

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40176

classification

Title:	xml parser bug
Type:		Stage:
Components:	XML	Versions:	Python 2.3

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	dtefft, loewis
Priority:	normal	Keywords:

Created on 2004-04-22 19:35 by dtefft, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
xml_parse_blast.py	dtefft, 2004-04-22 19:38
failed_xml.txt	dtefft, 2004-04-22 19:38

Messages (4)
msg20560 - (view)	Author: David Tefft (dtefft)	Date: 2004-04-22 19:35
I am using minidom to parse an xml file. When I run the script on a linux machine the script truncates a string. When I run the script on a Mac running OSX the script behaves the way I expect. Anyone encounter this problem. My suspicion is the reason for the difference is the 32 vs 64 bit processors. Dave PS I would attach the xml file but it is quite large.
msg20561 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-04-24 08:16
Logged In: YES user_id=21627 I run the script on Linux, as "python xml_parse_blast.py failed_xml.txt", and it produces no output. This is because en(qseq) == len(midline) in all cases. So what is the expected output of the script?
msg20562 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-04-24 08:20
Logged In: YES user_id=21627 As a follow-up, I notice a bug in your script. To access the content of an element, you do .firstChild.nodeValue. This is incorrect: The content could be split over multiple text nodes. Older Python versions indeed did split element content over multiple text nodes. To obtain the text content of an element, you need to iterate over all children, find out which of them are text nodes, and concatenate their node values. If you know that you won't have any comments, processing instructions, or CDATA sections in the input, you can alternatively invoke .normalize() on the document, which will collapse subsequent text nodes into single ones. Assuming that this is the phenomenon you are seeing, it is a bug in your script, so I close this report as invalid.
msg20563 - (view)	Author: David Tefft (dtefft)	Date: 2004-04-24 12:11
Logged In: YES user_id=966295 There should be no output. However when I run the script on a linux box there is output. When I run it on Mac OSX it behaves properly Thanks for your input.

History
Date	User	Action	Args
2022-04-11 14:56:03	admin	set	github: 40176
2004-04-22 19:35:52	dtefft	create