Issue 1076070: HTMLParser can't handle page with javascript

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41252

classification

Title:	HTMLParser can't handle page with javascript
Type:		Stage:
Components:	Library (Lib)	Versions:

process

Created on 2004-11-30 15:22 by jhylton, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg23395 - (view)	Author: Jeremy Hylton (jhylton)	Date: 2004-11-30 15:22
Perhaps the page is malformed -- I notice at least one other problem with it -- but I'd like to parse it. Relevant excerpts appear to be: <script language="JavaScript"> <!-- um.menuCode[i].replace(/<\/(li\|ul)>/ig,'</$1>\n'); --> </script> goahead() identifies the next interesting part of the page as the </$1> inside the javascript. It's not seeing the comment. Should it? I changed interesting_cdata to lookup for <! as one interesting possibility and it parsed the comment successfully.
msg23396 - (view)	Author: Fred Drake (fdrake)	Date: 2006-05-02 20:20
Logged In: YES user_id=3066 The "<!--" hackery is used to deal with really old browsers that don't understand <script>. Technically, all it should have to look for is "</". Closing as not worth changing.