Issue 592441: Webchecker error on http://www.naleo.org

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/36999

classification

Title:	Webchecker error on http://www.naleo.org
Type:		Stage:
Components:	Demos and Tools	Versions:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	jhylton	Nosy List:	jhylton, mcsolrac, mwh
Priority:	normal	Keywords:

Created on 2002-08-08 04:40 by mcsolrac, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
WSJArticle002_source.txt	mcsolrac, 2002-08-08 04:40	Source code of an HTML document

Messages (5)
msg11864 - (view)	Author: Carlos Conti (mcsolrac)	Date: 2002-08-08 04:40
Webchecker version 1.25.6.1 on Windows 2000 Professional. Run webchecker with this argument http://www.naleo.org/WSJArticle002.htm Webchecker will return this traceback: Traceback (most recent call last): File "C:\Python22\Tools\webchecker\webchecker.py", line 858, in ? main() File "C:\Python22\Tools\webchecker\webchecker.py", line 222, in main c.run() File "C:\Python22\Tools\webchecker\webchecker.py", line 349, in run self.dopage(url) File "C:\Python22\Tools\webchecker\webchecker.py", line 403, in dopage page = self.getpage(url_pair) File "C:\Python22\Tools\webchecker\webchecker.py", line 507, in getpage return Page(text, url, maxpage=self.maxpage, checker=self) File "C:\Python22\Tools\webchecker\webchecker.py", line 671, in __init__ self.parser.feed(self.text) File "C:\Python22\lib\sgmllib.py", line 95, in feed self.goahead(0) File "C:\Python22\lib\sgmllib.py", line 161, in goahead k = self.parse_declaration(i) File "C:\Python22\lib\markupbase.py", line 66, in parse_declaration decltype, j = self._scan_name(j, i) File "C:\Python22\lib\markupbase.py", line 313, in _scan_name self.error("expected name token") File "C:\Python22\lib\sgmllib.py", line 102, in error raise SGMLParseError(message) sgmllib.SGMLParseError: expected name token I believe this is because of the xml in the source code (see WSJArticle002_source.txt attached to this bug report). Even if the code in this page is poorly formatted, webchecker should be able continue checking other links in this domain (rather than stopping). For example webchecker could report “unable to check http://www.naleo.org/WSJArticle002.htm” and return traceback like the above, and then continue with the rest of the domain.
msg11865 - (view)	Author: Jeremy Hylton (jhylton)	Date: 2002-08-08 19:20
Logged In: YES user_id=31392 I've seen a variety of parsing problems kill webchecker. I agree that these exceptions should be caught somewhere so that they are not fatal. Care to submit a patch?
msg11866 - (view)	Author: Carlos Conti (mcsolrac)	Date: 2002-08-08 22:06
Logged In: YES user_id=591396 I'd love to submit a patch, but I am a newbie to both Python and programming. I apologize if this space is only intended for programmers; I am a QA engineer just getting acquainted to the wonderful world of Python.
msg11867 - (view)	Author: Jeremy Hylton (jhylton)	Date: 2002-08-13 13:36
Logged In: YES user_id=31392 No need to apologize. Everyone is welcome to submit bug reports here. There are, however, lots of programmers who submit bugs, so I find it helpful to ask :-). I'll look into this, but it's not the highest priority.
msg11868 - (view)	Author: Michael Hudson (mwh)	Date: 2004-08-07 20:49
Logged In: YES user_id=6656 jlgijsbers reports this as fixed by revision 1.30 of webchecker.py on #python-dev IRC.

History
Date	User	Action	Args
2022-04-10 16:05:34	admin	set	github: 36999
2002-08-08 04:40:27	mcsolrac	create