This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser -- possible bug in handle_comment
Type: Stage:
Components: Library (Lib) Versions: Python 2.2
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: logistix, scott_israel
Priority: normal Keywords:

Created on 2003-05-21 10:35 by scott_israel, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Messages (2)
msg16093 - (view) Author: Scott Israel (scott_israel) Date: 2003-05-21 10:35
>>> import HTMLParser
>>> class Parser(HTMLParser.HTMLParser):
	def __init__(self):
		HTMLParser.HTMLParser.__init__
(self)
	def handle_data(self,data):
		print 'DATA: %s' % data
	def handle_comment(self,comment):
		print 'COMMENT: %s' % comment

>>> test3='<STYLE><!-- This is a comment --> 
</STYLE>'
>>> p=Parser()
>>> p.feed(test3)
DATA: <!-- This is a comment -->

Is this a bug?
msg16094 - (view) Author: Grant Olson (logistix) Date: 2003-05-21 20:04
Logged In: YES 
user_id=699438

No, <style> is one of the tags that uses CDATA to make 
comments irrelevant.  This was done so to 'enable legacy 
support' by allowing authors to write:

<style>
<!--
body{dd:00;}
-->
</style>

Without the comments, most legacy browsers would display 
the text "body{dd:00;}" on the rendered webpage.

HTML Spec reference is here:
http://www.w3.org/TR/html4/present/styles.html#h-14.5
  
History
Date User Action Args
2022-04-10 16:08:51adminsetgithub: 38533
2003-05-21 10:35:31scott_israelcreate