This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make sgmllib char and entity references pluggable
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: fdrake Nosy List: fdrake, rubys
Priority: normal Keywords:

Created on 2006-06-12 10:41 by rubys, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sgmllib_pluggable_entityrefs.patch rubys, 2006-06-14 12:59 only substitute once, and now with a test case!
Messages (6)
msg28782 - (view) Author: Sam Ruby (rubys) Date: 2006-06-12 10:41
The changes being made to sgmllib in Python 2.5 may
break existing applications.  This patch makes it easy
for subclasses to revert to the old behavior. 
Additionally, it makes it easier to provide new
behaviors, like supporting unicode, hexadecimal
character references, and additional entities.
msg28783 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2006-06-14 05:14
Logged In: YES 
user_id=3066

This patch certainly makes the subclass interface nicer; I
like that.  There is a case that it breaks (foolishly not
covered by the existing tests, but clear on reading the
patch that it broke).  I've added the relevant test in this
change:

http://mail.python.org/pipermail/python-checkins/2006-June/053975.html

The problem with the patch is that attribute values are
transformed twice (once for entity refs, once for character
refs), instead of just once, so entity ref expansions can
cause character refs to be located that aren't in the markup.

I'm out of time tonight, but should be able to make this
patch work with the additional tests tomorrow night if sruby
doesn't beat me to it.

Documentation and tests for the subclass interface changes
are still needed as well.
msg28784 - (view) Author: Sam Ruby (rubys) Date: 2006-06-14 12:59
Logged In: YES 
user_id=141556

updated patch with test case.

Note that in the pre-existing code tag data values are
transformed twice -- this should be corrected and ideally
the code for handing references should be unified. 
msg28785 - (view) Author: Sam Ruby (rubys) Date: 2006-06-14 13:02
Logged In: YES 
user_id=141556

Note that the pre-existing code transforms tag data twice.

Ideally, the handing for entities in attributes and data
would be unified.
msg28786 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2006-06-14 13:06
Logged In: YES 
user_id=3066

Thanks.  I'll look into this again tonight.
msg28787 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2006-06-16 23:45
Logged In: YES 
user_id=3066

I've commited a modified version of this as revision 46995.
 The changes I made:

- avoid creating a function dynamically when parsing a start tag
  (used a method instead of a nested def)

- indirect the codepoint -> str conversion performed by
  convert_charref() through a convert_codepoint() method

- updated documentation
History
Date User Action Args
2022-04-11 14:56:18adminsetgithub: 43489
2006-06-12 10:41:46rubyscreate