This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bug #1452246 and patch #1087808; sgmllib entities
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, rvernica
Priority: normal Keywords: patch

Created on 2006-04-01 00:56 by rvernica, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sgmllib.diff rvernica, 2006-04-01 00:56 Lib/sgmllib.py and Lib/test/test_sgmllib.py diff
Messages (2)
msg49923 - (view) Author: Rares Vernica (rvernica) Date: 2006-04-01 00:56
Patch for bug #1452246 htmllib doesn't properly
substitute entities
Continuation of patch #1087808 sgmllib.SGMLParser does
not unescape attribute values; patch

Substitute entities in argument values

import htmllib
import formatter
import StringIO

s = StringIO.StringIO()
p =
htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter(s)))
p.feed('<img alt="<>&">')
print s.getvalue()

will now print '<>&' instead of '<>&'.

The patch modifies module sgmllib, class SGMLParser,
method parse_starttag. In this method, the entities are
substituted in the argument values. The substitutions
are based on existing property SGMLParser.entitydefs.
For parsing is uses the regular expression entityref.

Regarding the differences between this patch and patch
#1087808:
  - use self.entitydefs to determine the set of entity
names that are supported;
  - unknown entities references are left alone;
  - the regular expression entityref is used to find
references;
  - a documentation patch is not needed as the method
is Internal.
  
Regarding the fact that semicolon after the entity name
is not mandatory in SGML, the way entityref is defined
"&lt " will become "< ", while "&lt" will stay "&lt",
regardless of being an attribute value.

The patch also adds test cases in module
test/test_sgmllib.py, class SGMLParserTestCase, method
test_attr_values. In that method, the proper
substitution is tested.

Ray
msg49924 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-04-01 08:35
Logged In: YES 
user_id=849994

Changed your patch a bit (only allowing entityrefs ending
with ';' and recognizing charrefs), added more tests and
docs and committed as rev. 43532.
History
Date User Action Args
2022-04-11 14:56:16adminsetgithub: 43139
2006-04-01 00:56:15rvernicacreate