This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser can't handle page with javascript
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: fdrake Nosy List: fdrake, jhylton
Priority: normal Keywords:

Created on 2004-11-30 15:22 by jhylton, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg23395 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2004-11-30 15:22
Perhaps the page is malformed -- I notice at least one
other problem with it -- but I'd like to parse it. 
Relevant excerpts appear to be:

<script language="JavaScript">
<!--
um.menuCode[i].replace(/<\/(li|ul)>/ig,'</$1>\n');
-->
</script>

goahead() identifies the next interesting part of the
page as the </$1> inside the javascript.  It's not
seeing the comment.  Should it?  I changed
interesting_cdata to lookup for <! as one interesting
possibility and it parsed the comment successfully.
msg23396 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2006-05-02 20:20
Logged In: YES 
user_id=3066

The "<!--" hackery is used to deal with really old browsers
that don't understand <script>.  Technically, all it should
have to look for is "</".

Closing as not worth changing.
History
Date User Action Args
2022-04-11 14:56:08adminsetgithub: 41252
2004-11-30 15:22:31jhyltoncreate