Issue 1462498: bug #1452246 and patch #1087808; sgmllib entities

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43139

classification

Title:	bug #1452246 and patch #1087808; sgmllib entities
Type:		Stage:
Components:	Library (Lib)	Versions:

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	georg.brandl, rvernica
Priority:	normal	Keywords:	patch

Created on 2006-04-01 00:56 by rvernica, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
sgmllib.diff	rvernica, 2006-04-01 00:56	Lib/sgmllib.py and Lib/test/test_sgmllib.py diff

Messages (2)
msg49923 - (view)	Author: Rares Vernica (rvernica)	Date: 2006-04-01 00:56
Patch for bug #1452246 htmllib doesn't properly substitute entities Continuation of patch #1087808 sgmllib.SGMLParser does not unescape attribute values; patch Substitute entities in argument values import htmllib import formatter import StringIO s = StringIO.StringIO() p = htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter(s))) p.feed('<img alt="<>&">') print s.getvalue() will now print '<>&' instead of '<>&'. The patch modifies module sgmllib, class SGMLParser, method parse_starttag. In this method, the entities are substituted in the argument values. The substitutions are based on existing property SGMLParser.entitydefs. For parsing is uses the regular expression entityref. Regarding the differences between this patch and patch #1087808: - use self.entitydefs to determine the set of entity names that are supported; - unknown entities references are left alone; - the regular expression entityref is used to find references; - a documentation patch is not needed as the method is Internal. Regarding the fact that semicolon after the entity name is not mandatory in SGML, the way entityref is defined "&lt " will become "< ", while "&lt" will stay "&lt", regardless of being an attribute value. The patch also adds test cases in module test/test_sgmllib.py, class SGMLParserTestCase, method test_attr_values. In that method, the proper substitution is tested. Ray
msg49924 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-04-01 08:35
Logged In: YES user_id=849994 Changed your patch a bit (only allowing entityrefs ending with ';' and recognizing charrefs), added more tests and docs and committed as rev. 43532.

History
Date	User	Action	Args
2022-04-11 14:56:16	admin	set	github: 43139
2006-04-01 00:56:15	rvernica	create