This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: htmllib.HTMLParser.anchorlist problem
Type: Stage:
Components: Library (Lib) Versions: Python 2.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: cpgray, gaul, loewis
Priority: normal Keywords:

Created on 2003-03-29 00:26 by cpgray, last changed 2022-04-10 16:07 by admin. This issue is now closed.

Messages (3)
msg15290 - (view) Author: Chris Gray (cpgray) Date: 2003-03-29 00:26
htmllib.HTMLParser.anchorlist is cleared when
__init__() is called but not when reset() is called. 
Processing more than one document with the same
instance accumulates anchors from all documents
processed in the list.

Arguably a feature not a bug, but it makes sense for
reset to clear whatever is initialized by __init__.

Here is an illustrative IDLE session:

Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit
(Intel)] on win32
Type "copyright", "credits" or "license" for more
information.
IDLE 0.8 -- press F1 for help
>>> import htmllib
>>> import formatter
>>> p = htmllib.HTMLParser(formatter.NullFormatter())
>>> p.feed('<a href="http://www.python.org">Python</a>')
>>> p.anchorlist
['http://www.python.org']
>>> p.reset()
>>> p.feed('<a
href="http://sourceforge.net/">Sourceforge</a>')
>>> p.anchorlist
['http://www.python.org', 'http://sourceforge.net/']
msg15291 - (view) Author: Andrew Gaul (gaul) Date: 2003-08-22 08:44
Logged In: YES 
user_id=139865

See patch 793021 for a fix.
msg15292 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-09-12 16:38
Logged In: YES 
user_id=21627

Fixed with #793021.
History
Date User Action Args
2022-04-10 16:07:56adminsetgithub: 38229
2003-03-29 00:26:11cpgraycreate