This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sgmllib support for additional tag forms
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: fdrake, loewis, slott56
Priority: normal Keywords: patch

Created on 2002-04-17 18:16 by slott56, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sgmllib_declaration_parse.diff slott56, 2002-04-17 18:16 sgmllib declaration parse diff's
sgmllib_declaration_parse.diff slott56, 2002-04-22 18:50 Revised patch against current CVS main trunk
Messages (6)
msg39626 - (view) Author: Steven F. Lott (slott56) Date: 2002-04-17 18:16
MS-word generated HTML includes declaration 
tags of the form: 
<![if !supportEmptyParas]>&nbsp;<![endif]>
scattered throughout the body of an HTML 
document.

The current sgmllib parse_declaration routine 
rejects these as invalid syntax, where browsers 
tolerate these embedded declarations.

This patch accepts these declaration forms.
msg39627 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-04-18 17:23
Logged In: YES 
user_id=21627

That patch looks wrong: You are changing what a tag is,
removing the underscore, however, underscores are allowed in
tag names.

Also, could you please generate the patch against the CVS
version of the code? Your patch doesn't apply for the
current code, which has changed significantly compared to
the version you appear to be using.

There is no way that this can go into 2.1 IMO.
msg39628 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2002-04-21 15:11
Logged In: YES 
user_id=3066

This is the same as bug #505747.

These "tags" are not legal HTML in any form, but are some
Microsoft invention.  It's not entirely clear what the right
thing to do is, but it is clear that we need to deal with
these in some different way.

Changed group to indicate that such changes can only go into
the trunk; feature changes in maintenance versions are not
allowed.
msg39629 - (view) Author: Steven F. Lott (slott56) Date: 2002-04-22 18:50
Logged In: YES 
user_id=328067

My suggestion for handling this MS extension syntax is 
to (1) tolerate the extension without an error, (2) treat it 
as an SGML marked section, using the 
unknown_decl() call-back.  Since this is a separate 
function, subclasses can override to alter this behavior.  

The content hidden in these MS-specific marked 
section appears to always be a &nbsp;.  While it might 
be expedient to completly skip over this junk, it makes it 
difficult to handle marked sections in a future version of 
markupbase.

Attached is a revised patch against V1.39 of sgmllib.py 
and 1.4 of markupbase.py
msg39630 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-11-22 09:23
Logged In: YES 
user_id=21627

I now recommend to approve this patch. It improves SGML
correctness, and, while supporting an MS extension,
explicitly points out that it is doing so.
msg39631 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-03-30 14:54
Logged In: YES 
user_id=21627

Thanks for the patch, I have installed it as

markupbase.py 1.7
sgmllib.py 1.43
test_htmllib.py 1.3
NEWS 1.706

This also fixes bugs 505747 and 704996.
History
Date User Action Args
2022-04-10 16:05:14adminsetgithub: 36452
2002-04-17 18:16:12slott56create